有效的Common Motif识别算法

Efficient Common Motif Finding Algorithm

下载PDF

导出

摘要模体发现在揭示基因组水平上的基因表达调控规律以及在蛋白质序列中定位保守结构域中起着重要作用。本文提出一种在生物序列中识别Common Motif(公共模体)的算法。算法采用基于后缀数组或QSA数组的重复模式识别算法挖掘串中最大重复模式作为基元,对基元进行过滤与剪枝后,根据约束条件对优化后基元进行计算与处理从而得到公共模体。算法与基于后缀树或Trie树的同类算法相比在时间和空间效率上都得到了提高。 Motif finding plays an important role on revealing the regulation of gene expression in the genomic level and targeting the conserved domains in the protein sequence. This paper presents an algorithm for finding Common Motif in biological sequences. The algorithm uses the repeat detection algorithms which based on suffix array or QSA array to mining the maximal repeats as primitives. After filtering and pruning, optimized primitives are calculated and processed according to constraints to obtain the common motif. The algorithm is more time and space efficient than the algorithms based on suffix tree or Trie.

作者木妮娜.玉素甫古丽娜.玉素甫

机构地区新疆师范大学计算机科学技术学院新疆师范大学教育科学学院

出处《电脑知识与技术（过刊）》 2016年第4X期164-168,共5页 Computer Knowledge and Technology

基金新疆维吾尔自治区自然科学基金(No.2012211A056)

关键词模体发现重复模式约束条件生物计算后缀数组 Motif finding repeats constraints bioinformatics suffix array

分类号 Q811.4 [生物学—生物工程] TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献16

1GuhaThakurta Debraj.Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Research . 2006
2Munina Yusufu,Gulina Yusufu.Efficient Algorithm for Ex-tracting Complete Repeats from Biological Sequences. Inter-national Journal of Computer Applications . 2015
3P.Antoniou,M.Crochemore,C.Iliopoulos,P.Peterlongo.Ap-plication of suffix trees for the acquisition of common motifswith gaps in a set of strings. Proceedings of the Internation-al Conference on Language and Automata Theory and Applica-tions . 2007
4Pavlos Antoniou,Jan Holub,Costas S Iliopoulos,Bo ivojMelichar,Pierre Peterlongo.Finding Common Motifs withGaps Using Finite Automata. Lecture Notes in ComputerScience . 2006
5Toru Kasai,Gunho Lee,Hiroki Arimura,Setsuo Arikawa,Kunsoo Park.Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications. CPM 2001 . 2001
6D’Haeseleer,Patrik.What are DNA sequence motifs?. Nature Biotechnology . 2006
7Manolis Kellis,Nick Patterson,Matthew Endrizzi,Bruce Birren,Eric S. Lander.Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature . 2003
8Modan K Das,Ho-Kwok Dai.A survey of DNA motif finding algorithms. BMC Bioinformatics . 2007
9Timothy L. Bailey,Charles Elkan.??Unsupervised learning of multiple motifs in biopolymers using expectation maximization(J)Machine Learning . 1995 (1)
10木妮娜.玉素甫,古丽娜.玉素甫,张海军.基于QSA数组计算序列中所有NE重复模式的算法[J].计算机科学,2014,41(3):249-252. 被引量：3

二级参考文献10

1胡吉祥,许洪波,刘悦,程学旗.重复串特征提取算法及其在文本聚类中的应用[J].计算机工程,2007,33(2):65-67. 被引量：6
2BensonG.Tandem repeats finder:aprogramtoanalyzeDNAsequences[J].Nucleic Acids Research,1999,27 (2):573-580.
3Lander E S,Linton L M,Birren B,et al.Initial Sequencing and Analysis of the Human Genome[J].Nature,2001,409 (6822):860-921.
4Price A L,Jones N C,Pevzner P A.De novo identification of repeat families in large genomes[J].Bioinformatics,2005,21 (Suppl.):351-358.
5Franek F,Smyth W F,Tang Yu-dong.Computing all repeats using suffix arrays[J].Automata,Languages and Combinatorics,2003,8(4):579-591.
6Narisawa K,Inenaga S,Bannai H,et al.Efficient computation ofsubstring equivalence classes with suffix arrays[C] //Proc.18thAnnual Symp.Combinatorial Pattern Matching.2007:340-351.
7Puglisi S J,Smyth W F,Yusufu M.Fast optimal algorithms for computing all the repeats in a string[J].Mathematics in Computer Science,2010,3(4):373-389.
8Franek F,Holub J,Smyth W F,et al.Computing quasi suffix arrays[J].J.Automata,Languages &-Combinatorics,2003,8 (4):593-606.
9霍红卫,王小武.DNA序列中基于适应性后缀树的重复体识别算法[J].计算机学报,2010,33(4):747-754. 被引量：4
10纪震,周家锐,姜来,Q.H.Wu.DNA序列数据压缩技术综述[J].电子学报,2010,38(5):1113-1121. 被引量：8

共引文献2

1木妮娜.玉素甫,古丽娜.玉素甫.重复模式识别算法及在Web信息抽取和聚类分析中的应用[J].计算机科学,2017,44(B11):39-45. 被引量：1
2王菲.程序设计语言数组串行运算方法研究[J].信息与电脑,2021,33(8):53-55.

1黄影.模体发现问题中OOPS模型的EM算法[J].科教导刊,2015(08X):20-21.
2张懿璞.一种新的DNA模体发现聚类求精算法[J].西安电子科技大学学报,2014,41(6):95-99. 被引量：1
3霍红卫,于强,牛伟.结合最大团求精的随机投影模体发现算法[J].中国科技论文,2013,8(4):342-349.
4张守霞,高琳.基于位置相互关系的模体识别算法[J].电子科技,2010,23(1):15-17.
5侯仓健,陈岭,吕明琪,陈根才.基于加速度传感器的放置方式和位置无关运动识别[J].计算机科学,2014,41(10):76-79. 被引量：7
6郭浩东,陈岭,丁永锋,陈根才.运动识别中基于主题的特征构建方法[J].浙江大学学报（工学版）,2016,50(6):1149-1154.
7霍红卫,林帅,于强,张懿璞.基于MapReduce的模体发现算法[J].中国科技论文,2012,7(7):487-494. 被引量：7
8刘岩.PCI标准和应用现状[J].信息技术与标准化,2008(6):12-14.
9鲍卫华,王纯燕.线虫和酵母基因组中简单串联重复序列的统计分析[J].内蒙古工业大学学报（自然科学版）,2003,22(1):12-15.
10郝彤,马红武,赵学明.云计算在生物技术领域的应用[J].数学的实践与认识,2012,24(17):117-123. 被引量：3

电脑知识与技术（过刊）

2016年第4X期

浏览历史

内容加载中请稍等...

有效的Common Motif识别算法

参考文献16

二级参考文献10

共引文献2

相关作者

相关机构

相关主题

浏览历史