期刊文献+

结合最大团求精的随机投影模体发现算法

A random projection algorithm for motif search based on maximum clique refinement
下载PDF
导出
摘要 模体发现是生物信息学和计算机科学中最具挑战性的问题之一,对未经比对的DNA序列中定位转录因子结合位点起着重要的作用。将模体发现问题转化为无向图中求解最大团的问题,并提出了一种结合最大团求精的随机投影模体发现算法(MCR2PA)。与原有的投影算法相比,对于大多数模体发现问题,MCR2PA的识别准确率都有所提高。多组真实生物数据上的实验结果验证了所提算法的实用性。特别地,对于酿酒酵母数据,预测准确率能够达到80%以上。 Motif search is one of the most challenging problems in bioinformatics and computer science, playing an important role in locating transcription factor binding sites in unaligned DNA sequences. This paper converts motif search problem to finding maximum cliques in the undirected graph, and proposes a random projection motif search algorithm based on maximum clique re- finement, called MCR2PA. Compared with the original projection algorithm, MCR2PA achieves a better prediction accuracy on most motif search problems. The experimental results on multiple groups of real biological data demonstrate the practicability of the proposed algorithm; in particular, the prediction accuracy is higher than 80~ for the data of Saccharomyces cerevisiae.
出处 《中国科技论文》 CAS 北大核心 2013年第4期342-349,共8页 China Sciencepaper
基金 国家自然科学基金资助项目(61173025) 高等学校博士学科点专项科研基金资助项目(20100203110010) 中央高校基本科研业务费资助项目(K5051303002)
关键词 模体发现 转录因子结合位点 最大团 随机投影 motif search transcription factor binding sites maximum cliques random projection
  • 相关文献

参考文献23

  • 1Pevzner P, Sze S. Combinatorial approaches to finding subtle signals in DNA sequences [C]//Proceedings of the Eighth International Conference on Intelligent Sys- tems for Molecular Biology. Menlo Park, California: AAAI Press, 2000: 269-278.
  • 2Evans P, Smith A, Wareham H. On the complexity of finding common approximate substrings [J]. Theor Comput $ci, 2003, 306:407-430.
  • 3Pavesi G, Mauri G, Pesole G. An algorithm for finding signals of unknown length in DNA sequences[J]. Bioinformatics, 2001, 17: 207-204.
  • 4Eskin E, Pevzner P. Finding composite regulatory pat- terns in DNA sequences [J]. Bioinformatics, 2002, 18: 354-363.
  • 5Pisanti N, Carvalho A, Marsan L, et al. RISOTTO: fast extraction of motifs with mismatches [C]//Pro- ceedings of the Seventh Latin American Symposium: Theoretical Informatics. Arequipa, Peru: Springer, LNCS 3887, 2006: 757-768.
  • 6Davila J, Balla S, Rajasekaran S. Fast and practical al gorithms for planted (l, d) motif search [J]. IEEE/ ACM Trans Comput Biol Bioinform, 2007, 4 ( 4 ) : 544-552.
  • 7Ho E, Jakubowski C, Gunderson S. iTriplet, a rule- based nucleic acid sequence motif finder [J]. Algor Mol Biol, 2009, 4:1-14.
  • 8Dinh H, Rajasekaran S, Kundeti V. PMSS.. an efficient exact algorithm for the (e, d)-motif finding problem [J]. BMC Bioinform, 2011, 12: dol: 10. 1186/1471-2105-12-410.
  • 9霍红卫,林帅,于强,张懿璞.基于MapReduce的模体发现算法[J].中国科技论文,2012,7(7):487-494. 被引量:7
  • 10Bailey T, Elkan C. Fitting a mixture model by expecta- tion maximization to discover motifs in biopolymers [C]//Proceedings of the Second International Confer- ence on Intelligent Systems for Molecular Biology. Menlo Park, California: AAAI Press, 1994: 28-36.

二级参考文献21

  • 1Evans P,Smith A,Wareham H. On the complexity of finding common approximatesubstrings[J].TheorComputSci,2003,(1/3):407-430.
  • 2Das M,Dai H. A survey of DNA motif finding algorithms[J].BMC Bioinformatics,2007,(Suppl,7):S21.
  • 3Hu J,Li B,Kihara D. Limitations and potentials of current motif discovery algorithms[J].Nucleic Acids Research,2005,(15):4899-4913.
  • 4LawrenceC,AltschulS,BoguskiM. Detectingsubtlesequencesignals:a Gibb's sampling strategy for multiple alignment[J].Science,1993,(5131):208-214.
  • 5Bailey T,Elkan C. Fiting a mixture model by expectation maximization to discover motifs in biopolymers[A].Menlo Park,California:AAAIPress,1994.28-36.
  • 6Buhler J,Tompa M. Finding motifs using random projections[J].Journal of Computational Biology,2002,(02):225-242.
  • 7Huo Hongwei,Zhao Zhenhua,Stojkovic V. Optimizing genetic algorithm for motif discovery[J].Mathematical and Computer Modelling,2010,(11/12):2011-2020.
  • 8Pevzner P,Sze S. Combinatorial approaches to finding subtle signals in DNA sequences[A].Menlo Park,California:AAAI Press,2000.269-278.
  • 9Pisanti N,Carvalho A,Marsan L. RISOTTO: Fast extraction of motifs with mismatches[A].Arequipa,Peru:Springer,2006.757-768.
  • 10Davila J,Bala S,Rajasekaran S. Fast and practical algorithms for planted (l,d) motif search[J].IEEE/ACM Trans Comput Biol Bioinform,2007,(04):544-552.

共引文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部