判据搜索算法及其在DNA序列模式发现中的应用(英文) 被引量：2

Criterion Search Algorithm and its Application to Motif Finding in DNA Sequences

下载PDF

导出

摘要模式发现是计算生物学一个重要的研究方向,但目前的大部分算法还不能保证获得最优的模式。将模式发现问题转化成层次图的路径搜索问题,推导了针对三个序列片段相似性关系的判据,以其作为剪枝规则提出并实现了一种深度优先的穷举搜索算法:判据搜索算法(CriterionSearchAlgorithm,CRISA)。理论分析表明,对于绝大多数模式发现问题,CRISA具有多项式的计算时间复杂度和线性的空间复杂度。对仿真的和实际的DNA序列数据的测试表明,CRISA能够快速而完全地识别出序列中所有的模式,并且获得了优于其它算法的总体评价。 Motif finding is one of the fundamental problems in computational biology with important applications in finding regulatory signals. Though many algorithms have been developed, very few of them can obtain all motifs from the unaligned DNA sequences. A novel criterion for three subsequences was proposed , which was deduced through exploring valid paths in the layer graph. Then this criterion was used as prune rule in the exhaustive depth first search algorithm, criterion search algorithm （CR1SA）, to find all putative motifs rapidly. Analysis of the computational complexity and error probability proves that CRISA is efficient in most motif finding problems. Some tests using simulated and real biological data were done and the results show that CRISA is more efficient than other exhaustive search algorithms, and its search speed is even faster than that of many non-exhaustive search algorithms.

作者杜耀华李冬冬王正志

机构地区国防科技大学机电工程与自动化学院

出处《系统仿真学报》 EI CAS CSCD 北大核心 2006年第5期1169-1177,共9页 Journal of System Simulation

基金国家自然科学基金(60471003)

关键词模式发现判据剪枝规则深度优先搜索层次图 motif finding criterion prune rule depth first search layer graph

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献19

1Eskin E,Pevzner P.Finding Composite Regulatory Patterns in DNA Sequences[J].Bioinformatics(S1367-4803),2002,18:354-363.
2Hertz G,Stormo G.Identifying DNA and Protein Patterns with Statistically Significant Alignments of Multiple Sequences[J].Bioinformatics(S1367-4803),1999,15:563-577.
3Lawrence C,Altschul S,Boguski M,et al.Detecting Subtle Sequence Signals:a Gibbs Sampling Strategy for Multiple Alignment[J].Science(S0036-8075),1993,262:208-214.
4Bailey T,Elkan C.Unsupervised Learning of Multiple Motifs in Biopolymers using Expectation Maximization[J].Machine Learning (S0885-6125),1995,21:51-80.
5Pevezner P,Sze S.Combinatorial Approaches to Finding Subtle Signals in DNA Sequences[C] // In Proceeding of the 8th International Conference on Intelligent Systems for Molecular Biology.San Diego:AAAI Press,2000:269-278.
6Buhler J,Tompa M.Finding Motifs using Random Projections[C]// In Proceedings of the Fifth Annual International Conference on Research in Computational Molecular Biology.Montreal:ACM Press,2001:69-76.
7Keich U,Pevzner P.Finding Motifs in the Twilight Zone[J].Bioinformatics(S1367-4803),2002,18:1382-1390.
8Price A,Ramabhadran S,Pevzner P.Finding Subtle Motifs by Branching from Sample Strings[J].Bioinformatics(S1367-4803),2003,19:ii149-ii155.
9Brazma A,Jonassen I,Eidhammer I,et al.Approaches to the Automatic Discovery of Patterns in Bio-sequences[J].J.Comput.Biol.(S1066-5277),1998,5:279-305.
10Tompa M.An Exact Method for Finding Short Motifs in Sequences with Application to the Ribosome Binding Site Problem[C]// In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology.Heidelberg:AAAI Press,1999:262-271.

同被引文献9

1李川川,刘衍珩,田大新.基于序列模式的网络入侵检测系统[J].吉林大学学报（工学版）,2007,37(1):121-125. 被引量：7
2张长海,胡孔法,陈凌.序列模式挖掘算法综述[J].扬州大学学报（自然科学版）,2007,10(1):41-46. 被引量：5
3邹力鹍,张其善.基于多最小支持度的加权关联规则挖掘算法[J].北京航空航天大学学报,2007,33(5):590-593. 被引量：17
4Mohammed J. Zaki. SPADE: An Efficient Algorithm for Mining Frequent Sequences[J] 2001,Machine Learning(1-2):31～60
5李国徽,杨兵,陈辉,杜建强.移动环境下支持实时事务处理的数据预取[J].计算机学报,2008,31(10):1841-1847. 被引量：8
6耿汝年,董祥军,须文波.一种有效的基于图遍历的加权序列模式挖掘算法[J].控制与决策,2009,24(5):663-669. 被引量：4
7俞东进,郑苏杭,李万清.基于多核并行的海量数据序列模式挖掘[J].计算机应用研究,2012,29(2):478-481. 被引量：4
8欧阳为民,郑诚,蔡庆生.数据库中加权关联规则的发现[J].软件学报,2001,12(4):612-619. 被引量：96
9齐奉忠,申瑞臣,刘英,金庆姬.国内固井技术现状问题及研究方向建议[J].钻采工艺,2004,27(2):7-10. 被引量：41

引证文献2

1魏伟杰,张明卫,张斌,王波.基于最小加权支持的加权序列模式挖掘算法[J].吉林大学学报（工学版）,2008,38(S2):178-183. 被引量：2
2尚福华,孙姝凝,陈效果,杜睿山.基于完井业务流程的加权序列模式挖掘算法研究[J].计算机应用研究,2014,31(9):2719-2723.

二级引证文献2

1孙粮磊,李云,尹江,陈崚.一种改进的加权序列模式挖掘算法[J].计算机与数字工程,2010,38(11):4-9.
2张传玉,杨鹤标.加权序列模式在临床异常行为检测中的应用[J].信息技术,2016,40(11):182-184. 被引量：1

1周屹.不确定对象的反向最近邻查询研究[J].黑龙江工程学院学报,2012,26(4):34-37.
2孙爱程.基于熵距离的离群点检测及其应用[J].无线电工程,2012,42(6):45-47. 被引量：3
3王代星,张小平,王翰虎.基于决策树结构特性的后剪枝技术研究[J].电脑与信息技术,2010,18(4):1-4. 被引量：1
4周悦,邢妍妍.基于ODDD水下机器人故障诊断方法[J].计算机测量与控制,2015,23(4):1106-1108.
5薛安荣,闻丹丹,刘彬.加速大规模数据集的离群点检测[J].计算机应用,2013,33(11):3057-3061.
6黄程波,易桂生.Rough集定义的拓广[J].计算机与现代化,2000(1):37-41.
7孙宝法.DNA序列数据的聚类挖掘[J].河南科学,2004,22(5):600-604.
8张磊,王学慧,窦文华.基于主从支配点的无线自组网络广播算法及优化[J].计算机学报,2006,29(11):1920-1928. 被引量：3
9林毅申,林丕源,彭宏.基于字典的DNA序列压缩算法研究及应用[J].计算机应用研究,2007,24(6):265-267. 被引量：4
10杨泽雪,郝忠孝.空间数据库中的障碍反向最近邻查询[J].计算机工程与应用,2011,47(34):130-133. 被引量：1

系统仿真学报

2006年第5期

浏览历史

内容加载中请稍等...

判据搜索算法及其在DNA序列模式发现中的应用(英文) 被引量：2

参考文献19

同被引文献9

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史