摘要
模式发现是计算生物学一个重要的研究方向,但目前的大部分算法还不能保证获得最优的模式。将模式发现问题转化成层次图的路径搜索问题,推导了针对三个序列片段相似性关系的判据,以其作为剪枝规则提出并实现了一种深度优先的穷举搜索算法:判据搜索算法(CriterionSearchAlgorithm,CRISA)。理论分析表明,对于绝大多数模式发现问题,CRISA具有多项式的计算时间复杂度和线性的空间复杂度。对仿真的和实际的DNA序列数据的测试表明,CRISA能够快速而完全地识别出序列中所有的模式,并且获得了优于其它算法的总体评价。
Motif finding is one of the fundamental problems in computational biology with important applications in finding regulatory signals. Though many algorithms have been developed, very few of them can obtain all motifs from the unaligned DNA sequences. A novel criterion for three subsequences was proposed , which was deduced through exploring valid paths in the layer graph. Then this criterion was used as prune rule in the exhaustive depth first search algorithm, criterion search algorithm (CR1SA), to find all putative motifs rapidly. Analysis of the computational complexity and error probability proves that CRISA is efficient in most motif finding problems. Some tests using simulated and real biological data were done and the results show that CRISA is more efficient than other exhaustive search algorithms, and its search speed is even faster than that of many non-exhaustive search algorithms.
出处
《系统仿真学报》
EI
CAS
CSCD
北大核心
2006年第5期1169-1177,共9页
Journal of System Simulation
基金
国家自然科学基金(60471003)