期刊文献+

基于YKW图形表达的人类基因短编码序列识别

Short coding sequence identification of human genes based on YKW graphical representation
下载PDF
导出
摘要 针对人类短编码序列的识别问题,根据碱基在密码子三个位置的偏性和碱基自身物理化学性质的分类,提出一种新的图形表示方法——YKW图形,然后在此图形上,提取了9个有效的面积矩阵特征,识别过程中,为了提高识别率利用递增特征选择算法添加4个统计特征,并采用主元分析(PCA)方法对这13个特征降维,最后使用支持向量机(SVM)对人类的短编码序列进行编码区/非编码区识别。实验结果表明,与其他方法相比,该方法使用较少的特征(7个或4个)取得了更好的识别结果。 According to base bias in the three positions of codon and base chemical properties,the YKW graph,a new graphical representation of gene sequences was introduced for recognizing short coding sequences of human genes.Nine effective features of area matrix were extracted in the YKW curves.In the identifying process,the incremental feature selection algorithm was used to add four statistical features to improve the accuracy.Then Principal Component Analysis(PCA) method was adopted to reduce dimensions and Support Vector Machine(SVM) was applied to classify the coding/un-coding sequence in short human genes.Finally,the experimental results show that the proposed method uses fewer features(seven or four) and gets better recognition results than other methods.
出处 《计算机应用》 CSCD 北大核心 2011年第8期2087-2091,共5页 journal of Computer Applications
基金 国家自然科学基金资助项目(60873184) 湖南省自然科学基金资助项目(07JJ5086)
关键词 图形表达 短编码序列识别 面积矩阵 基因序列 graphical representation short coding sequence identification area matrix gene sequence
  • 相关文献

参考文献25

  • 1FICKETT J W. Recognition of protein coding regions in DNA se- quences [ J]. Nucleic Acids Research, 1982, 10(17) : 5303 - 5318.
  • 2STADEN R, MCLACHLAN A D. Codan preference and its use in i- dentifying protein coding regions in long DNA sequences [ J]. Nu- cleic Acids Research, 1982, 10(1) : 141 - 156.
  • 3CLAVERIE J M, BOUGUELERET L. Heuristic informational analy- sis of sequences [J]. Nucleic Acids Research, 1986, 14(1): 179 - 196.
  • 4MENA-CHALCO J P, CARRER H, ZANA Y, et al. Identification of protein coding regions using the modified Gabor-wavelet [ J]. IEEE/ACM Transactions on Computational Biology and Bioinformat- its, 2008, 5(2) : 198 -207.
  • 5ROY M, BISWAS S, BARMAN S. Identification and analysis of coding and non-coding regions of a DNA sequence by positional fre- quency distribution of nucleotides (PFDN) algorithm [ C]//CODEC 2009: Proceedings of the 4th International Conference on Computers and Devices for Communication. Washington, DC: IEEE Computer Society, 2009:1-4.
  • 6NICOLAS C, FRIAS D. Classifying coding DNA with nucleotide statistics [ J]. Bioinformatics and Biology Insights, 2009, 3:141 - 154.
  • 7HUANG L, AL BATA1NEH M, ATKIN G E, et al. A novel gene detection method based on period-3 property [ C]// Proceedings of the 31 st Annual International Conference of the IEEE Engineering inMedicine and Biology Society. Piseataway, NJ: IEEE, 2009:3857 - 3860.
  • 8DATYA S, ASIF A. A fast D/ZT based gene prediction algorithm for identification of protein coding regions [ C]// ICASP 2005: Pro- ceedings of 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington, DC: IEEE Computer Society, 2005:653-656.
  • 9ANASTASSIOU D. Frequency-domain analysis of biomoteculars se- quences [J]. Bioinfonnatics, 2000, 6(12) : 1073 - 1082.
  • 10ZHOU LI-QIAN, YU ZU-GUO, DENG JI-QING, et al. A fractal method to distinguish coding and non-coding sequences in a com- plete genome based on a number sequence representation [ J]. Journal of Theoretical Biology, 2005, 232(4) : 559 - 567.

二级参考文献20

  • 1Vaseghi S V,Milner B P.Noise compensation methods for hidden Markov model speech recognition in adverse environments.Speech and Audio Processing,IEEETransactions on,1997,5(1):11~21
  • 2Gordan M,Kotropoulos C,Pitas I.Application of support vector machines classifiers to visual speech recognition.Image Processing.In:2002.Proceedings.2002 International Conference on,June 2002,3:129~132
  • 3Lefevre S,Bouton E,Brouard T,Vincent N.A new way to use hidden Markov models for object tracking in video sequences.Image Processing,2003.ICIP 2003.In:Proceedings.2003 International Conference on Volume 3,Sept.2003,2(Ⅲ-1):17~20
  • 4Fu Y,Shen R,Lu H.Watermarking scheme based on support vector machine for colour images.Electronics Letters,2004,40 (16):986~987
  • 5Baum L E,Sell G R.Growth functions for transformations on manifolds.Pac.J.Math,1968,27(2):211~227
  • 6Baum L E,Petrie T.Satistical inference for probabilistic functions of finite state Markov chains.Annmath.Stat.,1996,37:1554~1563
  • 7Kundsen S.Promoter 2.0:for the Recognition of Poll Promoter Sequences.Bioinformatics,1999,15:356~361
  • 8Kasabov N,Pang S.TRANSDUCTIVE SUPPORT VECTOR MCHINES AND APPLICATIONS IN BIONFORMATICS FOR PROMOTER RECOGNITION.In:IEEE int.Conf.Neural Networks & Signal Processing.Nanjing,China,December,2003.14~17
  • 9EKENEL H K,SANKUR B.Multiresolution face recognition[J].Image and vision computing,2005,23(5):469-477.
  • 10DAI D Q,YUEN P C.Wavelet based discriminant analysis for face recognition[J].Applied Mathematics and Computation,2006,175(1):307-318.

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部