期刊文献+

基于MapReduce的模体发现算法 被引量:7

An algorithm for motif finding based on MapReduce
下载PDF
导出
摘要 模体发现对于基因发现和理解基因调控关系有着重要的意义,它是生物信息学中最具挑战性的问题之一。提出了针对PMSP算法的3种数据划分方法,并在此基础上提出了基于MapReduce的模体发现算法(PMSPMR)。针对不同难度的问题,在Hadoop集群上的实验结果表明,PMSPMR算法具有良好的可扩展性。特别地,对于难度较大的模体发现问题实例,PMSPMR算法的加速比接近于Hadoop集群中节点的数目。此外,对于真实数据的实验,PMSPMR算法能够识别出真核细胞和酿酒酵母中已知的转录调控模体,表明了算法的有效性。 Motif search plays an important role in gene finding and understanding gene regulation relationship, and is one of the most challenging problems in bioinfotmatics. This paper presents three data partitioning methods for the PMSP algorithm and proposes the PMSP MapReduce algorithm (PMSPMR) for solving motif search problems. For problems of varying difficulty, the experimental results on the Hadoop cluster demonslrate that PMSPMR has good scalability. In particular, for motif search problems with high levels of difficulty, PMSPMR shows its advantage because the speedup is almost linearly proportional to the number of nodes in the Hadoop cluster. This paper also presents experimental results on realistic biological data by identifying known transcriptional regulatory motifs in eukaryotes as well as in actual promoter sequences extracted from Saccharomyces cerevisiae.
出处 《中国科技论文》 CAS 北大核心 2012年第7期487-494,502,共9页 China Sciencepaper
基金 国家自然科学基金资助项目(61173025) 高等学校博士学科点专项科研基金资助项目(20100203110010)
关键词 模体发现 数据划分 可扩展性 motiffinding data partitioning scalability
  • 相关文献

参考文献21

  • 1Evans P,Smith A,Wareham H. On the complexity of finding common approximatesubstrings[J].TheorComputSci,2003,(1/3):407-430.
  • 2Das M,Dai H. A survey of DNA motif finding algorithms[J].BMC Bioinformatics,2007,(Suppl,7):S21.
  • 3Hu J,Li B,Kihara D. Limitations and potentials of current motif discovery algorithms[J].Nucleic Acids Research,2005,(15):4899-4913.
  • 4LawrenceC,AltschulS,BoguskiM. Detectingsubtlesequencesignals:a Gibb's sampling strategy for multiple alignment[J].Science,1993,(5131):208-214.
  • 5Bailey T,Elkan C. Fiting a mixture model by expectation maximization to discover motifs in biopolymers[A].Menlo Park,California:AAAIPress,1994.28-36.
  • 6Buhler J,Tompa M. Finding motifs using random projections[J].Journal of Computational Biology,2002,(02):225-242.
  • 7Huo Hongwei,Zhao Zhenhua,Stojkovic V. Optimizing genetic algorithm for motif discovery[J].Mathematical and Computer Modelling,2010,(11/12):2011-2020.
  • 8Pevzner P,Sze S. Combinatorial approaches to finding subtle signals in DNA sequences[A].Menlo Park,California:AAAI Press,2000.269-278.
  • 9Pisanti N,Carvalho A,Marsan L. RISOTTO: Fast extraction of motifs with mismatches[A].Arequipa,Peru:Springer,2006.757-768.
  • 10Davila J,Bala S,Rajasekaran S. Fast and practical algorithms for planted (l,d) motif search[J].IEEE/ACM Trans Comput Biol Bioinform,2007,(04):544-552.

同被引文献40

  • 1王淑娟,赵再新,高宏亮,翟国富.基于GPS和GSM的铁路机车监控调度系统车载单元的设计[J].测控技术,2005,24(6):69-72. 被引量:4
  • 2袁孝均.轨道电路分路不良问题研究[J].铁道通信信号,2007,43(4):11-14. 被引量:31
  • 3黄采伦,樊晓平,陈特放,张剑.铁路机车实时安全状态监测及故障预警系统[J].机车电传动,2007(4):62-66. 被引量:6
  • 4Pevzner P, Sze S. Combinatorial approaches to finding subtle signals in DNA sequences [C]//Proceedings of the Eighth International Conference on Intelligent Sys- tems for Molecular Biology. Menlo Park, California: AAAI Press, 2000: 269-278.
  • 5Evans P, Smith A, Wareham H. On the complexity of finding common approximate substrings [J]. Theor Comput $ci, 2003, 306:407-430.
  • 6Pavesi G, Mauri G, Pesole G. An algorithm for finding signals of unknown length in DNA sequences[J]. Bioinformatics, 2001, 17: 207-204.
  • 7Eskin E, Pevzner P. Finding composite regulatory pat- terns in DNA sequences [J]. Bioinformatics, 2002, 18: 354-363.
  • 8Pisanti N, Carvalho A, Marsan L, et al. RISOTTO: fast extraction of motifs with mismatches [C]//Pro- ceedings of the Seventh Latin American Symposium: Theoretical Informatics. Arequipa, Peru: Springer, LNCS 3887, 2006: 757-768.
  • 9Davila J, Balla S, Rajasekaran S. Fast and practical al gorithms for planted (l, d) motif search [J]. IEEE/ ACM Trans Comput Biol Bioinform, 2007, 4 ( 4 ) : 544-552.
  • 10Ho E, Jakubowski C, Gunderson S. iTriplet, a rule- based nucleic acid sequence motif finder [J]. Algor Mol Biol, 2009, 4:1-14.

引证文献7

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部