期刊文献+

支持向量分类器及其在原核生物基因计算识别中的应用

Support Vector Machine and Its Application in Computational Recognition of Prokaryotes Gene
下载PDF
导出
摘要 以支持向量机为分类器,序列的k-letter词为特征,建立了原核生物的基因识别模型。分别选取已知功能的基因为正样本,和与等长正样本的随机突变序列为负样本组成训练集。5倍交叉实验的结果表示,对于具有不同核函数的支持向量机以及不同长度的词特征,其预测准确率不同,最高的可达94%以上,最差的低于60%;长度为3的词的特征的分类结果最好,其次是长度为4。这说明3联核苷酸为基因序列比较好的统计特征。 A model of gene recognition of Prokaryotes is built, with Support Vector Machine as a method of classification and k-letters word of a sequence as a characteristic. The train set consists of positive samples which are chosen out from known-function genes and equal negative ones generated randomly from the corresponding positive sample. The resuk in 5-cross experiments indicates that accuracy of prediction for SVMs varies with kernal functions and length of word, better above 94% and worse below 60%; the best classification result is of 3-letter word and next 4-letter word. This demonstrates 3 amino acids is a better statistical characteristic ofgene sequences.
作者 黄国华
出处 《湖南第一师范学院学报》 2011年第2期133-136,共4页 Journal of Hunan First Normal University
基金 湖南省教育厅科研项目(09C888)
关键词 支持向量机 基因识别 核函数 K—letter词 Support Vector Machine gene recognition kemal function K-letter word
  • 相关文献

参考文献6

  • 1Besemer J, Lomsadze A, Borodovsky M. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions[J]. Nucleic Acids Research, 2001, 29 (12), 2607 - 2618.
  • 2Burge, C. and Karlin, S. Prediction of complete gene structures in human genomic DNA[J]. J. Mol. Biol, 1997, 268, 78-94.
  • 3Guo F B, Ou H Y, Zhang C T. ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes[J]. Nucleic Acids Res. (2003).31, 1780-1789.
  • 4Chen, L L, Ou, H Y, Zhang R, Zhang C T. ZCURVE_CoV: a new system to recognize protein coding genes in coronavirus genomes, and its applica tions in analyzing SARS-CoV genomes[J]. Biochem Biophys. Res. Cornmun, 2003, 307, 382-388.
  • 5史良,尉春艳,高琦.国内外基因计算机识别的研究方法及进展[J].北京生物医学工程,2004,23(1):73-74. 被引量:3
  • 6Vapnik V N. Statistical learning theory, II support vector Estimation of functions [M]. New York: John Wiley and Sons, 1998, 375-567.

二级参考文献19

  • 1蔡煜东,陈常庆.用神经网络方法识别真核基因内含子并确定基因的编码序列[J].生物化学与生物物理学报,1993,25(2):135-141. 被引量:3
  • 2孙键,徐军,凌伦奖,沈如群,陈润生.用神经网络法预测mRNA的剪接位点[J].生物物理学报,1993,9(1):127-131. 被引量:7
  • 3孟捷,陈滔,刘次全,彭守礼,胡光涛,杨自天,张许生,陈中轩,陈琳,王运祥,曾健,张静.蛋白质编码区与非编码区的特征与识别[J].生物数学学报,1996,11(2):75-82. 被引量:3
  • 4Claverie JM. Computational methods for the indentification of genes in vertebrate genomic sequence. Hum Mol Genet, 1997, 6 (10): 1735-44
  • 5DongH, etal. JMolBiol, 1996, 260:649-663
  • 6Berg O G and Kurland C G. J Mol Biol, 1997, 270:544-550
  • 7Akashi H. Genetics, 1994, 136:927-935
  • 8Diaz-Lazcoz H, et al. Mol Biol, 1993, 250:123-127
  • 9Beckmann JS, Brendel V, Trifonov EN. Intervening sequences exhibit distinct vocabulary. J Biomol Struct Dyn, 1986, 4 (3): 391-400
  • 10Claverie JM. Computational methods for the iodentification of genes in vertebrate genomic sequence. Hum Mol Genet, 1997, 6 (10): 1735-44

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部