摘要
基因组测序是生物信息学中最基本的研究方向之一,然而大多数生物的基因组都不可能一次性获得,需要利用序列拼接技术对实验中获得的DNA片段进行拼接操作。目前,测序过程中获得的DNA片段越来越短,基于Euler路径的拼接算法在处理这种短片段拼接时具有优势。在Euler路径算法中,一个关键的步骤是deBruijn图的构建,一直以来,构建deBruijn图的方式总是让后一个κ-mer与前一个κ-mer之间有κ-1个碱基的交叠,相邻的两个κ-mer之间相互错开一位。但文中的研究发现,如果有边连接的两个κ-mer之间有κ-2个或者更少的碱基相交叠,会对deBruijn图结构复杂性产生重要影响。针对这些影响进行详细分析,并设计实验进行验证,实验结果表明,κ-mer之间的错位数变化对deBruijn图结构复杂性有显著影响。
DNA sequencing is one of the most basic directions ofbioinformatics research. However, most genomes are not a one-time gain. So DNA assembly technique is used to splice the fragment obtained in experiments. Recently, the fragments obtained in experiments become shorter. The Euler Path algorithm has more advantages to deal with these shorter fragments, the construction of de Bruijn graph is a key step of the Euler Path algorithm. κ-1 base pair overlap is always made between two κ-reefs. But the study of this paper finds that if less than κ-2 base pair overlap between two κ-mers is made, the construction of de Bruijn graph will be changed strongly. This paper makes a detailed analysis of these effects, and designs an experiment to verify the analysis. The result of the experiment shows that the dislocation ofk-mers will significantly affect the construction of de Bruijn graph.
作者
王东阳
任世军
王亚东
WANG Dongyang, REN Shijun, WANG Yadong (School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)
出处
《智能计算机与应用》
2011年第2X期20-25,30,共7页
Intelligent Computer and Applications