结合最大团求精的随机投影模体发现算法

A random projection algorithm for motif search based on maximum clique refinement

下载PDF

导出

摘要模体发现是生物信息学和计算机科学中最具挑战性的问题之一,对未经比对的DNA序列中定位转录因子结合位点起着重要的作用。将模体发现问题转化为无向图中求解最大团的问题,并提出了一种结合最大团求精的随机投影模体发现算法(MCR2PA)。与原有的投影算法相比,对于大多数模体发现问题,MCR2PA的识别准确率都有所提高。多组真实生物数据上的实验结果验证了所提算法的实用性。特别地,对于酿酒酵母数据,预测准确率能够达到80%以上。 Motif search is one of the most challenging problems in bioinformatics and computer science, playing an important role in locating transcription factor binding sites in unaligned DNA sequences. This paper converts motif search problem to finding maximum cliques in the undirected graph, and proposes a random projection motif search algorithm based on maximum clique re- finement, called MCR2PA. Compared with the original projection algorithm, MCR2PA achieves a better prediction accuracy on most motif search problems. The experimental results on multiple groups of real biological data demonstrate the practicability of the proposed algorithm; in particular, the prediction accuracy is higher than 80~ for the data of Saccharomyces cerevisiae.

作者霍红卫于强牛伟

机构地区西安电子科技大学计算机学院

出处《中国科技论文》 CAS 北大核心 2013年第4期342-349,共8页 China Sciencepaper

基金国家自然科学基金资助项目(61173025) 高等学校博士学科点专项科研基金资助项目(20100203110010) 中央高校基本科研业务费资助项目(K5051303002)

关键词模体发现转录因子结合位点最大团随机投影 motif search transcription factor binding sites maximum cliques random projection

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献23

1Pevzner P, Sze S. Combinatorial approaches to finding subtle signals in DNA sequences [C]//Proceedings of the Eighth International Conference on Intelligent Sys- tems for Molecular Biology. Menlo Park, California: AAAI Press, 2000: 269-278.
2Evans P, Smith A, Wareham H. On the complexity of finding common approximate substrings [J]. Theor Comput $ci, 2003, 306:407-430.
3Pavesi G, Mauri G, Pesole G. An algorithm for finding signals of unknown length in DNA sequences[J]. Bioinformatics, 2001, 17: 207-204.
4Eskin E, Pevzner P. Finding composite regulatory pat- terns in DNA sequences [J]. Bioinformatics, 2002, 18: 354-363.
5Pisanti N, Carvalho A, Marsan L, et al. RISOTTO: fast extraction of motifs with mismatches [C]//Pro- ceedings of the Seventh Latin American Symposium: Theoretical Informatics. Arequipa, Peru: Springer, LNCS 3887, 2006: 757-768.
6Davila J, Balla S, Rajasekaran S. Fast and practical al gorithms for planted (l, d) motif search [J]. IEEE/ ACM Trans Comput Biol Bioinform, 2007, 4 ( 4 ) : 544-552.
7Ho E, Jakubowski C, Gunderson S. iTriplet, a rule- based nucleic acid sequence motif finder [J]. Algor Mol Biol, 2009, 4:1-14.
8Dinh H, Rajasekaran S, Kundeti V. PMSS.. an efficient exact algorithm for the (e, d)-motif finding problem [J]. BMC Bioinform, 2011, 12: dol: 10. 1186/1471-2105-12-410.
9霍红卫,林帅,于强,张懿璞.基于MapReduce的模体发现算法[J].中国科技论文,2012,7(7):487-494. 被引量：7
10Bailey T, Elkan C. Fitting a mixture model by expecta- tion maximization to discover motifs in biopolymers [C]//Proceedings of the Second International Confer- ence on Intelligent Systems for Molecular Biology. Menlo Park, California: AAAI Press, 1994: 28-36.

二级参考文献21

1Evans P,Smith A,Wareham H. On the complexity of finding common approximatesubstrings[J].TheorComputSci,2003,(1/3):407-430.
2Das M,Dai H. A survey of DNA motif finding algorithms[J].BMC Bioinformatics,2007,(Suppl,7):S21.
3Hu J,Li B,Kihara D. Limitations and potentials of current motif discovery algorithms[J].Nucleic Acids Research,2005,(15):4899-4913.
4LawrenceC,AltschulS,BoguskiM. Detectingsubtlesequencesignals:a Gibb's sampling strategy for multiple alignment[J].Science,1993,(5131):208-214.
5Bailey T,Elkan C. Fiting a mixture model by expectation maximization to discover motifs in biopolymers[A].Menlo Park,California:AAAIPress,1994.28-36.
6Buhler J,Tompa M. Finding motifs using random projections[J].Journal of Computational Biology,2002,(02):225-242.
7Huo Hongwei,Zhao Zhenhua,Stojkovic V. Optimizing genetic algorithm for motif discovery[J].Mathematical and Computer Modelling,2010,(11/12):2011-2020.
8Pevzner P,Sze S. Combinatorial approaches to finding subtle signals in DNA sequences[A].Menlo Park,California:AAAI Press,2000.269-278.
9Pisanti N,Carvalho A,Marsan L. RISOTTO: Fast extraction of motifs with mismatches[A].Arequipa,Peru:Springer,2006.757-768.
10Davila J,Bala S,Rajasekaran S. Fast and practical algorithms for planted (l,d) motif search[J].IEEE/ACM Trans Comput Biol Bioinform,2007,(04):544-552.

共引文献6

1王鑫鑫,卢晓红,贾振元,贾旭,李光俊,武文毅.微铣削表面粗糙度预测模型的研究[J].新型工业化,2013,2(10):39-47.
2程航,栗风永,余江,张新鹏.基于空间滤波的LBP特征和彩色直方图的加密域图像检索#[J].新型工业化,2013,2(11). 被引量：4
3周小平,刘祥磊.海量铁路机车GIS定位数据分布式处理技术[J].中国科技论文,2015,10(7):812-816. 被引量：3
4魏笑笑,王小正,王圣滔,谢田田.基于Spark的校园信息分析系统的设计与实现[J].软件,2017,38(10):94-99. 被引量：1
5胡宏涛,龚逸文.植入(l,d)模体发现若干算法的实现与比较[J].智能计算机与应用,2019,9(1):211-213.
6贺梦洁,朱美正,初宁,杨岗.基于Spark平台的地理数据并行装载技术[J].软件,2016,37(12). 被引量：1

1霍红卫,林帅,于强,张懿璞.基于MapReduce的模体发现算法[J].中国科技论文,2012,7(7):487-494. 被引量：7
2黄影.模体发现问题中OOPS模型的EM算法[J].科教导刊,2015(08X):20-21.
3王菊,刘付显,靳春杰,李祯东.一种面向不确定数据流的模体发现算法[J].电子科技大学学报,2017,46(1):81-87. 被引量：3
4张懿璞.一种新的DNA模体发现聚类求精算法[J].西安电子科技大学学报,2014,41(6):95-99. 被引量：1
5王菊,刘付显.一种面向多属性不确定数据流的模体发现算法[J].电子与信息学报,2017,39(1):159-166. 被引量：1
6覃桂敏,高琳,呼加璐.生物网络模体发现算法研究综述[J].电子学报,2009,37(10):2258-2265. 被引量：7
7沈一飞,陈国良,张强峰.基于纳米计算结构上的生物序列模体发现算法[J].小型微型计算机系统,2007,28(4):635-639. 被引量：3
8木妮娜.玉素甫,古丽娜.玉素甫.有效的Common Motif识别算法[J].电脑知识与技术（过刊）,2016,22(4X):164-168.
9张守霞,高琳.基于位置相互关系的模体识别算法[J].电子科技,2010,23(1):15-17.
10侯仓健,陈岭,吕明琪,陈根才.基于加速度传感器的放置方式和位置无关运动识别[J].计算机科学,2014,41(10):76-79. 被引量：7

中国科技论文

2013年第4期

浏览历史

内容加载中请稍等...

结合最大团求精的随机投影模体发现算法

参考文献23

二级参考文献21

共引文献6

相关作者

相关机构

相关主题

浏览历史