期刊文献+

人类基因PolyA位点预测

Prediction of Polyadenylation in Human Gene Sequences
下载PDF
导出
摘要 mRNA 3′端的多聚腺苷酸化是真核细胞内mRNA转录后处理的三个最主要步骤之一.对DNA序列上发生多聚腺苷酸化的位置即PolyA位点的识别,对于理解mRNA的形成机制以及进行基因结构预测具有重要作用.本研究利用机器学习方法对PolyA位点进行预测,其实现过程分为以下三个步骤:特征的生成、特征的筛选、特征的综合分析聚类.首先,我们采取统计k阶核苷酸频率的方法来生成初始的特征;然后,通过信息学知识来对特征进行筛选;最后,使用SVM(Support Vector Machines,支持向量机)的方法进行特征的综合分析,确定参数,建立预测模型.在独立的测试数据集上进行测试,当敏感度(Sn)固定为60%时,在内含子水平和外显子水平上的特异性(Sp)分别为71.67%和80.77%,在内含子水平上的预测精度明显优于国际上的同类软件. Polyadenylation (PolyA) occurs in mRNA 3'end is one of the three main steps of eukaryotic pre-mRNA processing. The prediction of polyadenylation sites in human DNA and mRNA sequences is very important for realizing the pre-mRNA processing and prediction of gene structure. This paper presents a machine learning method to predict polyadenylation signals (PASes) in human DNA and mRNA sequences. This method consists of three steps of feature manipulation: Generation, selection and integration of features. In the first step, new features are generated using k-gram nucleotide acid patterns. In the second step, a number of important features are selected by an entropy-based algorithm. In the third step, support vector machines are employed to recognize true PASes from a large number of candidates. At last, a mathematic model forms. When the sensitivity is 60%, the corresponding specificity is 71.67% on intron level, and 80.77% on exon level.
出处 《计算机学报》 EI CSCD 北大核心 2008年第6期927-933,共7页 Chinese Journal of Computers
基金 国家自然科学基金重大研究计划(90608020) 高等学校博士点专项科研基金(20050487037) "教育部新世纪优秀人才"和"科技部国家科技基础条件平台建设专项"资助~~
关键词 PolyA信号 机器学习 支持向量机 Polyadenylation Signals machine learning entropy support vector machines
  • 相关文献

参考文献27

  • 1Edwalds Gilbert G, Veraldi K L, Milcarek C. Alternative poly(A) site selection in complex transcription units: Mean to an end? Nucleic Acids Research, 1997, 25(13): 2547- 2561.
  • 2Wang Z, Day N, Trifillis P, Kiledjian M. An mRNA stabili ty complex functions with poly (A)-binding protein to stabi lize mRNA in vitro. Molecular and Cellular Biology, 1999 19(7): 4552-4560.
  • 3Decker C J, Parker R. A turnover pathway for both stable and unstable mRNAs in yeast: Evidence for a requirement for deadenylation. Genes & Development, 1993, 7 (8): 1632 -1643.
  • 4Chen Z, Li Y, Krug R M. Influenza A virus NS1 protein targets poly(A)-binding protein II of the cellular 3'-end process ing machinery. EMBO Journal, 1999, 18(8) : 2273 -2283.
  • 5Craig A W B, Haghighat A, Yu A T K. Interaction of polyadenylate-binding protein with the eIFCG homologue PAIP enhances translation. Nature, 1998, 392(6675): 520-523.
  • 6Zarudnaya M I, Hovorun D M. Hypothetical double-helical poly(A) formation in a cell and its possible biological significance. IUBMB Life, 1999, 48(6): 581-584.
  • 7Gehring N H, Frede U, Neu-Yilik G, Hundsdoerfer P, Vetter B, Hentze M W, Kulozik A E. Increased efficiency of mRNA 3'end formation: A new genetic mechanism contribu ting to hereditary thrombophilia. Nature Genetics, 2001, 28(4) : 389-392.
  • 8Conne B, Stutz A, Vassalli J D. The 3' untranslated region of messenger RNA: A molecular ‘hotspot' for pathology?. Nature Medicine, 2000, 6(6): 637-641.
  • 9Kan Z, Rouchka E C, Gish W R et al. Gene structure predic tion and alternative splicing analysis using genomically aligned ESTs. Genome Research, 2001, 11(5): 889-900.
  • 10Colgan D F, Manley J L. Mechanism and regulation of mRNA polyadenylation. Genes & Development, 1997, 11(21): 2755-2766.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部