期刊文献+

基于PFD过滤器查找DNA序列中相似性重复片段 被引量:1

Partition Frequency Distance Based Filter Method for Finding Approximate Repetitions in DNA Sequences
下载PDF
导出
摘要 在DNA序列中查找重复片段是基因序列分析的一个重要课题.由于重复片段的模式长度范围较大,所以仅使用编辑距离(edit distance)很难良好的衡量序列的相似性.提出了衡量重复片段相似性的新标准,新标准表达了序列间的距离与序列中相同部分的关系.考虑到计算的复杂性,基于频率向量提出了新的距离函数PFD(partition frequency distance)以及相应的过滤函数,用以产生重复片段的候选集,提高查找算法的效率.采用后继数组代替滑动窗口的方法进行序列划分,避免只可在等长的片段上查找重复片段的限制.实验结果表明,与TRF(tandem repeat finder)方法相比,基于PFD过滤函数的算法可以找到更多的满足相似性要求的重复片段.
出处 《计算机研究与发展》 EI CSCD 北大核心 2007年第z3期521-528,共8页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60273079,60573089)
  • 相关文献

参考文献12

  • 1[1]International Human Genome Sequencing Consortium.Initial sequencing and analysis of the human genome.Nature,2001,409(15):860-921
  • 2[2]S Beleza,C Alves,A Gonzalez-Neira,et al.Extending STR markers in Y chromosome haplotypes.International Journal of Legal Medicine,2003,117(1):27-33
  • 3[3]S Gilmore,R Peakall,J Robertson.Short tandem repeat (STR) DNA markers are hypervariable and informative in Cannabis Sativa:Implications for forensic investigations.Forensic Science International,2003,131(1):65-74
  • 4[4]G M Landau,J P Schmidt.An algorithm for approximate tandem repeats.The 4th Annual Symp on Combinatorial Pattern Matching,Padova,Italy,1993
  • 5[5]S Kurtz,J V Choudhuri,E Ohlebusch,et al.REPuter:The manifold applications of repeat analysis on a genomic scale.Nucleic Acids Research,2001,29(22):4633-4642
  • 6[6]M Sagot,E Myers.Identifying satellites in nucleic alid sequences.The 2nd Annual Int'l Conf on Computational Molecular Biology,New York,1998
  • 7[7]G Benson,M Waterman.A method for fast database search for all k-nucleotide repeats.Nucleic Acids Research,1994,22:4828-4836
  • 8[8]G Benson.Tandem repeats finder:A program to analyze DNA.Nucleic Acids Research,1998,27(2):573-580
  • 9[9]Y Wexler,Z Yakhini,Y Kashi,et al.Finding approximate tandem repeats in genomic sequences.In:Proc of RECOMB04.New York:ACM Press,2004.223-232
  • 10[10]T Kahveci,A K Singh.An efficient index structure of string database.VLDB 2001,Rome,Italy,2001

二级参考文献10

  • 1[1]W M David.Bioinformatics Sequence and Genome Analysis.New York:Cold Spring Harbor Laborary Press,2001
  • 2[2]International Human Genome Sequencing Consortium.Initial sequencing and analysis of the human genome.Nature,2001,409(15):860-921
  • 3[3]D K Nag,M Suri,E K Stenson.Both CAG repeats and inverted DNA repeats stimulate spontaneous unequal sister-chromatid exchange in saccharomyces cerevisiae.Nucleic Acids Res.2004,32(18):5677-5684
  • 4[4]G Benson.An algorithm for finding tandem repeats of unspecified pattern size.In:Proc of RECOMB98.New York:ACM Press,1998.20-29
  • 5[5]J Stoye,D Gusfield.Simple and flexible detection of contiguous repeats using a suffix tree.The 9th Annual Symposium on Combinatorial Pattern Matching(CPM'98),Piscataway,NJ,1998
  • 6[6]Apostolico,F P Preparata.Optimal off-line detection of repetitions in a string.Theor Comput Sci,1983,22:297-315
  • 7[7]S Kurtz,J V Choudhuri,E Ohlebusch,et al.REPuter:The manifold applications of repeat analysis on a genomic scale.Nucl Acids Res,2001,29(22):4633-4642
  • 8[8]M I Abouelhoda,S Kurtz,E Ohlebusch.The enhanced suffix array and its applications to genome analysis.In:Proc of the Workshop on Algorithms in Bioinformatics.Lecture Notes in Computer Science.Berlin:Springer-Verlag,2002.449-463
  • 9[9]G Mathieu,G Kucherov.Maximal repetitions and application to DNA sequences.The Journées Orvertes:Biologie,Informatique et Mathématiques,Montpellier,France,2002
  • 10王镝,王国仁,吴青泉,等.一种新的轻量级索引结构--后继数组[J].华中科技大学学报,2005,33(12):208-211.

共引文献4

同被引文献10

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部