摘要
重复序列的分析是基因组研究中的一个重要课题,进行这一研究的基础则是从基因组序列中快速有效地找出其中的重复序列。一种投影拼接算法,即利用随机投影获得候选片断集合,利用片断拼接对候选片断进行拼接,以发现基因组中的重复序列。分析了算法的计算复杂度,构造了半仿真测试数据,对算法的测试结果表明了其有效性。
Analysis of repeat sequences is an important subject for genomie research. To do this analysis, we have to find all unknown repeat sequences from the whole sequence first. In this paper, we propose a novel projection-assemble algorithm to find these repeats. The algorithm employs random projection algorithm to obtain a candidature segment set, and employs exhaust search algorithm searching each pairs of segmeres to find potential linkage between them and then assemble them together. The complexity of our projection - assemble algorithm is nearly linear to the length of genomie sequence, and its memory usage is an exponential function to a parameter, which is relative to the length of genomic sequence. But that isn' t a serious problem, because to sequences with length up to several decades of millions, this parameter can be set as a constant. We construct a test dataset to examine our algorithm, and the results show that it can find the repeat segnents effectively.
出处
《生物信息学》
2005年第4期163-166,174,共5页
Chinese Journal of Bioinformatics
关键词
重复序列
随机投影
拼接
repeat sequences
random projection
assemble