期刊文献+

用于转录因子结合位点识别的定位投影求精算法 被引量:2

A Novel Fixed-Position Projection Refinement Algorithm for TFBS Identification
下载PDF
导出
摘要 定位转录因子结合位点,也称模体发现问题,对于理解基因调控关系非常重要.文中提出了一种新的定位投影求精算法(Fixed-Position Projection Refinement algorithm,FPPR)用于DNA序列中的转录因子结合位点识别.通过一个基于数据集对应位置频率矩阵的投影过程,将DNA数据聚类为不同的子集,过滤选出其中具有一定信息量和复杂度的子集,作为初始状态,进而使用期望最大化算法进行求精.FPPR通过对定位投影过程中阈值的设定,实现了对OOPS、ZOOPS、TCM这3种模型中不同模体实例分布的处理.同时,结合高阶马尔可夫背景设计目标函数,使得算法的概率模型更加符合真实生物数据.此外,通过相似函数WIC评估,FPPR可拓展为解决多模体识别问题.真实数据测试表明,FPPR可以在合理的时间内准确找寻模体,与MEME、GAME、Motif Sampler和GALP-F等算法相比有更好的性能,并且可以有效地解决多模体识别问题. Locating the transcription factor bin for understanding the gene regulatory relations ding sites (TFBS), motif discovery, are crucial hip. This paper proposes a novel fixed-position projection refinement algorithm (FPPR) to identify the TFBS of DNA sequences. FPPR clusters DNA data into different subsets through a projection based on the corresponding probabilistic fre- quency matrix, and filters the subsets with certain information score and complexity which are used as the initial condition for expectation maximum refinement. FPPR achieves the different motif instances distribution in the model OOPS, ZOOPS and TCM by setting the threshold in the fixed-position projection. Meanwhile, FPPR can be extended to a multiple motifs discovery ver- sion by using the similarity function WIC. Experiments on the real datasets demonstrate our algo- rithm finds real motifs accurately in a proper time. Comparing with MEME, GAME, Motif Sampler and GALP-F, FPPR has the better performance, and it can solve the multiple motifs discovery effectively.
出处 《计算机学报》 EI CSCD 北大核心 2013年第12期2545-2559,共15页 Chinese Journal of Computers
基金 国家自然科学基金(61173025 61373044) 高等学校博士学科点专项科研基金(20100203110010) 中央高校基本科研业务费(K5051303032 K50513100011)资助~~
关键词 转录因子结合位点 模体 定位投影 求精 transcription factor binding sites motif fixed-position projection refinement
  • 相关文献

参考文献2

  • 1霍红卫,郭丹丹,于强,张懿璞,牛伟.(l,d)-模体识别问题的遗传优化算法[J].计算机学报,2012,35(7):1429-1439. 被引量:6
  • 2W.M. Shaw,Robert Burgin,Patrick Howell.Performance standards and evaluations in IR test collections: Cluster-based retrieval models[J].Information Processing and Management.1997(1)

二级参考文献28

  • 1Tompa M et al. Assessing computational tools for the discov- ery of transcription factor binding sites. Nature Biotechnology, 2005, 23(1): 137-144.
  • 2Das Modan K, Dai Ho-Kwok. A survey of DNA motif find- ing algorithms. BMC Bioinformaties, 2007, 8(Suppl 7)~ $21.
  • 3GuhaThakurta D. Computational identification of transcrip- tional regulatory elements in DNA sequence. Nucleic Acids Research, 2006, 34(12): 3585-3598.
  • 4Sinha S, Tompa M. YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresent- ation. Nucleic Acids Research, 2003, 31(13): 3586-3588.
  • 5Pesole G, Prunella N, Liuni S, Attimonelli M, Saccone C.WORDUP: An efficient algorithm for discovering statistically significant patterns in DNA sequences. Nucleic Acids Research, 1992, 20(11): 2871-2875.
  • 6Pavesi G, Mauri G, Pesole G. An algorithm for finding sig- nals of unknown length in DNA sequences. Bioinformatics, 2001, 17(1): S207-S214.
  • 7Marsan L, Sagot M-F. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. Journal of Computa- tional Biology, 2000, 7(3-4): 345-362.
  • 8Eskin E, Fevzner F A. l'inding composite regulatory pat- terns in DNA sequences. Bioinformatics, 2002, 18(1): 354-363.
  • 9Pevzner P A, Sze S H. Combinatorial approaches to finding subtle signals in DNA sequenees//Proeeedings of the Inter- national Conference on Intelligent Systems for Molecular Bi- ology (ISMB). Price Center, UC San Diego, La Jolla,California, 2000, 8:269-278.
  • 10GuhaThakurta D, Stormo G D. Identifying target sites for cooperatively binding factors. Bioinformatics, 2001, 17 (7) : 608-621.

共引文献5

同被引文献4

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部