期刊文献+

基于滑动窗口的原核转录起始位点计算定位方法 被引量:3

COMPUTATIONAL LOCATION OF TRANSCRIPTION START SITES IN PROKARYOTIC GENOME BASED ON SLIDING WINDOW
下载PDF
导出
摘要 转录起始位点的计算定位是基因转录调控研究的重要内容,但现有方法的识别性能较低。文章作者在已有原核启动子识别算法的基础上,提出了一种基于滑动窗口的原核转录起始位点计算定位方法,通过在合理限定的定位范围内对序列进行滑动扫描,来预测转录起始位点的位置。首先根据窗口序列的交迭组分特征和启动子其它特征分别建立二次判别分类器,用其计算对应位置的似然得分,再利用转录起始位点与翻译起始位点的间隔经验分布信息对似然得分进行修正,最后依照似然得分的分布情况由阈值定位算法确定预测位置。对大肠杆菌真实序列数据的测试结果表明,该定位算法可实现对真实转录起始位点位置的有效预测,与已有算法相比,当敏感性指标同为0.85左右时,特异性指标可从0.20提高至0.65,从而使得定位准确率提高了约20个百分点。 Although a great deal of effort has been undertaken in the area of transcription start site (TSS) computational location due to its essential role in the research of transcription regulation, the problem has not yet been resolved. According to the previous work on prediction algorithm of prokaryotic promoters, a new computational location method for prokaryotic TSSs based on sliding window was proposed. At first, the authors limited the rational searching ranges in genomic sequences based on the prior information of TSSs occurrence. Then the TSS likelihood scores of each possible position in genomic sequences were calculated by two window classifiers which were trained by quadratic discriminant analysis on overlap content features and other promoter features, respectively. The empirical distribution Of distances between TSSs and translation start sites (TLSs) was also utilized to amend the likelihood scores. Final location results were achieved through the procedure of threshold filtration on the likelihood score profiles. The testing results on E. coli datasets showed that the method could find the putative TSSs efficiently. Compared with other current algorithms, the specificity Sp could be improved from 0.20 to 0.65 when the sensitivity Sn was about 0.85, which made the location accuracy increasing by about 20 percents.
出处 《生物物理学报》 CAS CSCD 北大核心 2006年第5期360-366,共7页 Acta Biophysica Sinica
基金 国家自然科学基金项目(60471003)~~
关键词 原核基因组 转录起始位点 计算定位 滑动窗口 交迭组分特征 Prokaryotic genome Transcription start site (TSS) Computational location Sliding window Overlap content features
  • 相关文献

参考文献14

  • 1Harley C,Reynolds R.Analysis of E.coli promoter sequences.Nucleic Acids Research,1987,15(5):2343~2361
  • 2Lisser S,Margalit H.Compilation of E.coli mRNA promoter sequences.Nucleic Acids Research,1993,21(7):1507~1516
  • 3Werner T.Models for prediction and recognition of eukaryotic promoters.Mammalian Genome,1999,10(2):168~175
  • 4Ohler U,Niemann H.Identification and analysis of eukaryotic promoters:recent computational approaches.TRENDS in Genetics,2001,17(2):56~60
  • 5Hertz G,Stormo G.Escherichia coli promoter sequences.Analysis and prediction.Meth Enzymol,1996,273:31~42
  • 6Vanet A,Marsa nc L,Sagot M.Promoter sequences and algorithmical methods for identifying them.Res Microbiol,1999,150(9-10):779~799
  • 7Gordon L,Chervonenkis A,Gammerman A,Shahmuradov I,Solovyev V.Sequence alignment kernel for recognition of promoter regions.Bioinformatics,2003,19(15):1964~1971
  • 8Huerta A,Collado-Vides J.Sigma70 promoters in Escherichia coli:specific transcription in dense regions of overlapping promoter-like signals.J Mol Biol,2003,333(2):261~278
  • 9杜耀华,王正志,倪青山,李冬冬.一种基于特征筛选的原核生物启动子判别分析方法[J].生物物理学报,2006,22(1):39-48. 被引量:6
  • 10Burden S,Lin YX,Zhang R.Improving promoter prediction for the NNPP2.2 algorithm:a case study using Eschericlia coli DNA sequences.Bioinformatics,2005,21(5):601~607

二级参考文献35

  • 1Lisser S,Margalit H.Compilation of E.coli mRNA promoter sequences.Nucleic Acids Research,1993,21(7):1507~1516
  • 2Helmann J,Chamberlin M.Structure and function of bacterial sigma factors.Annu Rev Biochem,1988,57:839~872
  • 3Harley C,Reynolds R.Analysis of E.coli promoter sequences.Nucleic Acids Research,1987,15(5):2343~2361
  • 4Oppon E,Synergistic use of promoter prediction algorithms:a choice for small training dataset? PH.D.Thesis,South Africa:Western Cape University Press,2000
  • 5Sabatti C,Rohlin L,Liao J.Dictionary model for the analysis of E.coli promoter regions.In:Proceedings of the 25th Annual International Conference of the IEEE EMBS.Cancun,Mexico:IEEE Computer Society Press,2003.3711~3714
  • 6Hertz G,Stormo G.Escherichia coli promoter sequences.Analysis and prediction.Meth Enzymol,1996,273:31~42
  • 7Mahadevan I,Ghosh I.Analysis of E.coli promoter structures using neural networks.Nucleic Acids Research,1994,22(11):2158~2165
  • 8Pedersen A,Engelbrecht J.Investigations of Escherichia coli promoter sequences with artificial neural networks:new signals discovered upstream of the transcriptional startpoint.In:Rawlings C,Clark D,Altman R,Hunter L,Lengauer T,Wodak S.Proceedings of the third international conference on intelligent systems for molecular biology.Cambridge,United Kingdom:AAAI Press,1995.292~299
  • 9Pedersen A,Baldi P,Brunak S,Chauvin Y.Characterization of prokaryotic and eukaryotic promoters using hidden markov models.In:States D,Agarwal P,Gaasterland T,Hunter L,Smith R.Proceedings of the fourth international conference on intelligent systems for molecular biology.St.Louis,MO,USA:AAAI Press,1996.182~191
  • 10Bailey T,Elkan C.Unsupervised learning of multiple motifs in biopolymers using expectation maximization.Machine Learning,1995,21(1-2):51~80

共引文献5

同被引文献25

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部