摘要
提出了一种新的测序短片段定位算法Umap,算法引入核心片段逐步扩展延伸的基本思想,通过短片段间的重叠信息定位短片段.首先找出所有在参考基因组上只出现一次的短片段,称为唯一短片段.然后以唯一短片段为基础,利用短片段间的重叠信息,使用贪婪算法对唯一短片段进行扩展,进而确定其他非唯一短片段的准确位置.实验表明,该算法对短片段的定位比现有短片段定位算法更加准确,能够定位的短片段数目更多,匹配的短片段比率达到71%.通过利用客观存在于短片段间的重叠信息,可以更加准确地在参考基因组上对短片段在参考基因组上进行定位,减少模糊匹配.
A new short reads mapping algorithm Umap is presented here.Short reads are mapped to the reference genome using the main thought of contig extension based on reads overlap information.The unique reads which match only one position in the reference genome are found at first.Then,these unique reads are extended by greedy algorithm,and finally the un-unique reads' position in the reference genome are found.The experiments show that Umap can map short reads more accurately.And up to 71% short reads can be mapped to the reference genome.Taking advantages of the overlap information,short reads can be mapped to the reference genome more accurately.
出处
《东南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2011年第1期63-66,共4页
Journal of Southeast University:Natural Science Edition
基金
国家自然科学基金资助项目(60671018
60771024)
关键词
短片段
唯一子串
唯一短片段
片段重叠信息
short reads
unique k-tuple
unique short reads
overlap information