摘要
针对新一代DNA测序数据存在reads长度短、高覆盖度且存在错误数据等特点,研发满足实际应用的拼接软件,是序列拼接领域迫切的研究课题。探讨了全基因组序列拼接面临的挑战,研究了主流的几类拼接算法的拼接原理、操作流程,分析各种算法的优缺点和适用范围,其中包括基于贪心图算法、基于OLC图算法、基于De Bruijn图算法等,并根据不同的标准列举了几类拼接算法之间的差异性,最后对基因拼接算法在未来的研究给出了建议。
On condition that next genome sequencing data typically suffers shorter read lengths, high coverage, and different error profiles, development of the sequencing assembly software that could meet practical application has become the most important research topic. This paper analysed the challenges of whole genome assembly, the main strategies of assembly, the steps, the advantages and disadvantages of each algorithms as well as the scope of application, including the graph algorithms based on the greedy, OLC, De Bruijn and so on. On the basis of the principles of different algorithms, the paper gave the comparing results between the various strategies depending on the different standard. Finally, it discussed the feature research recommendations of genome assembly.
作者
颜珂
何威
徐勇
张健
Yan Ke;He Wei;Xu Yong;Zhang Jian(IntelliSense & Bioinformatics Innovation Team, Shenzhen Graduate School Harbin Institute of Technology, Shenzhen Guangdong 518055,China;School of Software Engineering, Shenzhen Institute of Information Technology, Shenzhen Guangdong 518172, China)
出处
《计算机应用研究》
CSCD
北大核心
2016年第9期2573-2578,共6页
Application Research of Computers
基金
2014年深圳市未来产业发展专项资金资助项目(CXZZ20140904154910774,JCYJ20140904154645958)