期刊文献+

有效的Common Motif识别算法

Efficient Common Motif Finding Algorithm
下载PDF
导出
摘要 模体发现在揭示基因组水平上的基因表达调控规律以及在蛋白质序列中定位保守结构域中起着重要作用。本文提出一种在生物序列中识别Common Motif(公共模体)的算法。算法采用基于后缀数组或QSA数组的重复模式识别算法挖掘串中最大重复模式作为基元,对基元进行过滤与剪枝后,根据约束条件对优化后基元进行计算与处理从而得到公共模体。算法与基于后缀树或Trie树的同类算法相比在时间和空间效率上都得到了提高。 Motif finding plays an important role on revealing the regulation of gene expression in the genomic level and targeting the conserved domains in the protein sequence. This paper presents an algorithm for finding Common Motif in biological sequences. The algorithm uses the repeat detection algorithms which based on suffix array or QSA array to mining the maximal repeats as primitives. After filtering and pruning, optimized primitives are calculated and processed according to constraints to obtain the common motif. The algorithm is more time and space efficient than the algorithms based on suffix tree or Trie.
出处 《电脑知识与技术(过刊)》 2016年第4X期164-168,共5页 Computer Knowledge and Technology
基金 新疆维吾尔自治区自然科学基金(No.2012211A056)
关键词 模体发现 重复模式 约束条件 生物计算 后缀数组 Motif finding repeats constraints bioinformatics suffix array
  • 相关文献

参考文献16

  • 1GuhaThakurta Debraj.Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Research . 2006
  • 2Munina Yusufu,Gulina Yusufu.Efficient Algorithm for Ex-tracting Complete Repeats from Biological Sequences. Inter-national Journal of Computer Applications . 2015
  • 3P.Antoniou,M.Crochemore,C.Iliopoulos,P.Peterlongo.Ap-plication of suffix trees for the acquisition of common motifswith gaps in a set of strings. Proceedings of the Internation-al Conference on Language and Automata Theory and Applica-tions . 2007
  • 4Pavlos Antoniou,Jan Holub,Costas S Iliopoulos,Bo ivojMelichar,Pierre Peterlongo.Finding Common Motifs withGaps Using Finite Automata. Lecture Notes in ComputerScience . 2006
  • 5Toru Kasai,Gunho Lee,Hiroki Arimura,Setsuo Arikawa,Kunsoo Park.Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications. CPM 2001 . 2001
  • 6D’Haeseleer,Patrik.What are DNA sequence motifs?. Nature Biotechnology . 2006
  • 7Manolis Kellis,Nick Patterson,Matthew Endrizzi,Bruce Birren,Eric S. Lander.Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature . 2003
  • 8Modan K Das,Ho-Kwok Dai.A survey of DNA motif finding algorithms. BMC Bioinformatics . 2007
  • 9Timothy L. Bailey,Charles Elkan.??Unsupervised learning of multiple motifs in biopolymers using expectation maximization(J)Machine Learning . 1995 (1)
  • 10木妮娜.玉素甫,古丽娜.玉素甫,张海军.基于QSA数组计算序列中所有NE重复模式的算法[J].计算机科学,2014,41(3):249-252. 被引量:3

二级参考文献10

  • 1胡吉祥,许洪波,刘悦,程学旗.重复串特征提取算法及其在文本聚类中的应用[J].计算机工程,2007,33(2):65-67. 被引量:6
  • 2BensonG.Tandem repeats finder:aprogramtoanalyzeDNAsequences[J].Nucleic Acids Research,1999,27 (2):573-580.
  • 3Lander E S,Linton L M,Birren B,et al.Initial Sequencing and Analysis of the Human Genome[J].Nature,2001,409 (6822):860-921.
  • 4Price A L,Jones N C,Pevzner P A.De novo identification of repeat families in large genomes[J].Bioinformatics,2005,21 (Suppl.):351-358.
  • 5Franek F,Smyth W F,Tang Yu-dong.Computing all repeats using suffix arrays[J].Automata,Languages and Combinatorics,2003,8(4):579-591.
  • 6Narisawa K,Inenaga S,Bannai H,et al.Efficient computation ofsubstring equivalence classes with suffix arrays[C] //Proc.18thAnnual Symp.Combinatorial Pattern Matching.2007:340-351.
  • 7Puglisi S J,Smyth W F,Yusufu M.Fast optimal algorithms for computing all the repeats in a string[J].Mathematics in Computer Science,2010,3(4):373-389.
  • 8Franek F,Holub J,Smyth W F,et al.Computing quasi suffix arrays[J].J.Automata,Languages &-Combinatorics,2003,8 (4):593-606.
  • 9霍红卫,王小武.DNA序列中基于适应性后缀树的重复体识别算法[J].计算机学报,2010,33(4):747-754. 被引量:4
  • 10纪震,周家锐,姜来,Q.H.Wu.DNA序列数据压缩技术综述[J].电子学报,2010,38(5):1113-1121. 被引量:8

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部