摘要
short-read alignment(短序列比对)在下一代测序技术中得到广泛运用.精确识别测序序列中的gap(空位)是后续基因组解读的基础,而现有的允许空位的short-read比对算法效果并不理想或者不允许插入空位.对于查询序列和参考序列均为short reads的比对问题,通过采取训练查询序列样本数据寻找不同物种和不同read长度匹配的最优插入空位数量的策略,对大规模的short reads进行两两比对,以减少算法的迭代次数,从而减少算法所需的中间矩阵计算量,并用向量存储算法比对过程中的中间矩阵元素值,以降低存储空间需求,提出一种改进的short-read比对算法.数千万的short reads对准实验结果表明:与已有的有代表性的同类算法相比,本文算法在确保short-read比对精确度的前提下,降低了所需的运行时间和存储空间.
Short-read alignment is applied widely in next-generation sequencing technology.Accurate identification of gaps is the basis for subsequent genome interpretation.The existing algorithms allowing insertion of gaps varies significantly and many performs poorly or not allows insertion of gaps at all.For sequence alignment problem that both query and reference sequences are short reads,this article,performs pairwise sequence alignment for millions of that reads by training query sequence sample data in order to find different species and the optimal number of inserting gaps matched with reads of different length.The improved short-read alignment algorithm can reduce the number of iterations of the algorithm to reduce the computation of the intermediate matrix and use vector to store intermediate matrix elements in the process to reduce storage space.The results for large-scale of short reads show that compared to the existing algorithms,the presented algorithm can improve the alignment accuracy and reduce the execution time and required memory space.
作者
杨永洁
钟诚
YANG Yong-jie;ZHONG Cheng(School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2019年第5期1004-1009,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61462005)资助
广西自然科学基金项目(2014GXNSFAA118396)资助