基于重叠信息的基因组测序短片段定位算法

Maximum use of reads overlap information for short reads mapping

下载PDF

导出

摘要提出了一种新的测序短片段定位算法Umap,算法引入核心片段逐步扩展延伸的基本思想,通过短片段间的重叠信息定位短片段.首先找出所有在参考基因组上只出现一次的短片段,称为唯一短片段.然后以唯一短片段为基础,利用短片段间的重叠信息,使用贪婪算法对唯一短片段进行扩展,进而确定其他非唯一短片段的准确位置.实验表明,该算法对短片段的定位比现有短片段定位算法更加准确,能够定位的短片段数目更多,匹配的短片段比率达到71%.通过利用客观存在于短片段间的重叠信息,可以更加准确地在参考基因组上对短片段在参考基因组上进行定位,减少模糊匹配. A new short reads mapping algorithm Umap is presented here.Short reads are mapped to the reference genome using the main thought of contig extension based on reads overlap information.The unique reads which match only one position in the reference genome are found at first.Then,these unique reads are extended by greedy algorithm,and finally the un-unique reads＇ position in the reference genome are found.The experiments show that Umap can map short reads more accurately.And up to 71% short reads can be mapped to the reference genome.Taking advantages of the overlap information,short reads can be mapped to the reference genome more accurately.

作者卢志远谢建明孙啸

机构地区东南大学生物电子学国家重点实验室

出处《东南大学学报（自然科学版）》 EI CAS CSCD 北大核心 2011年第1期63-66,共4页 Journal of Southeast University：Natural Science Edition

基金国家自然科学基金资助项目(60671018 60771024)

关键词短片段唯一子串唯一短片段片段重叠信息 short reads unique k-tuple unique short reads overlap information

分类号 TP311.51 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献11

1Mcpheron John D. Next-generation gap [ J ]. Nature Methods, 2009, 11(6) :S2 - S5.
2Altschul S F, Gish W, Miller W,et al. Basic local align- ment search tool[J]. J Mol Biol, 1990, 215(3) :403 -410.
3Ning Z, Cox A J, Mullikin J C. SSAHA: a fast search method for large DNA databases [ J]. Genome Res, 2001. 11(10) :1725 - 1729.
4Li H,, Ruan J, Durbin R. Mapping short DNA sequen- cing reads and calling variants using mapping quality scores[ J]. Genome Res, 2008, 18( 11 ) : 1851 - 1858.
5Lin H, Zhang Z, 2hang M Q, et al. ZOOM! zillions of oli- gos mapped[ J ]. Bioinformatics, 2308, 24(21 ):2431 -2437.
6Campagna D, Albiero A, Bilardi A, et al. PASS: a program to align short sequences [J ]. Bioinformatics, 2009, 25(7) :967 -968.
7Li R, Li Y, Krisfiansen K, et al. SOAP: short oligonucleofide alignment program[J]. Bioinformatics, 2008, 24(5):713-714.
8Burrows M, Wheeler D J. A block-sorting lossless data compression algorithm [ R]. Technical Report 124, America: Digital Equipment Corporation, 1994.
9Langmead B, Trapnell C, Pop M, et al. Ultrafast and memory-efficient alignment of short DNA sequences[J]. Genome Biology, 2009, 10 (3) : R25.
10Li H, Durbin R. Fast and accurate short read align- ment with burrows--wheeler transform [ J ]. Bioinfor- matics, 2009, 25(14):1754- 1760.

1贾振华,庄连英.基于切空间判别的稀疏数据降维方法[J].计算机工程与设计,2012,33(11):4268-4271.
2王东阳,任世军,王亚东.DNA序列拼接中de Bruijn图结构的研究[J].智能计算机与应用,2011,1(2X):20-25. 被引量：2
3连理,牛军钰,黄萱菁,吴立德.基于扩展布尔检索的Web检索算法[J].计算机工程,2004,30(3):24-25.
4官营铜.前端智能化引领监控智能新常态[J].A&S（安全&自动化）,2015(3):62-65. 被引量：1
5魏霖静,陈蕾.云计算技术在生物信息学中的应用[J].信息与电脑（理论版）,2014,0(9):122-123. 被引量：1
6虞瑾,丁晓青.联机手写公式中字符的切分与识别[J].电视技术,2007,31(B08):148-150. 被引量：1
7一句话信息[J].农村百事通,2009(22):11-11.
8黄东.扩大局部邻域的疏散嵌入算法[J].计算机工程与应用,2012,48(11):185-188.
9郑欣,林学訚.学习非唯一的最佳聚类数[J].清华大学学报（自然科学版）,2006,46(7):1282-1285. 被引量：1
10刘敏行.从微信功能扩展分析平台化的大数据战略[J].中国信息化,2014(9):44-46.

东南大学学报（自然科学版）

2011年第1期

浏览历史

内容加载中请稍等...

基于重叠信息的基因组测序短片段定位算法

参考文献11

相关作者

相关机构

相关主题

浏览历史