期刊文献+

基于激光解析技术在蛋白质关联图预测问题数据集不均衡的研究

Study of Protein Contacts Map Prediction on Imbalanced Data
下载PDF
导出
摘要 随着融合了激光解析等新技术的蛋白质自动测序技术发展,蛋白质序列越来越容易获得,如何通过蛋白质序列预测其结构成为重要研究问题。蛋白质关联图预测是蛋白质三级结构预测的中间步骤,是典型的数据集极度不均衡的分类问题,非关联类别数据远远多于关联类别数据。与文本分类等问题不同,蛋白质关联图预测问题的特征维数不高,因而不能从特征选择上进行数据集优化。为了有效减少多数类样本的规模,提出结合聚类的数据下采样预处理方法,使关联和非关联类别的分布趋于平衡。实验表明,支持向量机方法在优化后的蛋白质数据集可以有效实现数据分类。 With the development of automatic protein sequencing which integrating the new technologies such as laser analysis,protein sequences are more and more easily obtained,and prediction of protein structures based on sequences becomes an important research problem. Prediction of protein inter- residue contacts map is one of the most important intermediate steps to the protein structure study,and it is a typically class imbalance problems,and the amino acid residue pairs in contact are far more than pairs not in contact. Unlike text classification problems,feature dimensionality is not high in protein contacts map prediction,so the optimistic feature selection methods is not viable. In order to reduce the size of majority class,a new method of under- sampling based on clustering is proposed to balancing the dataset. Experimental results show that Support Vector Machine which combined the proposed method can predict protein contacts map effectively.
作者 刘君 宋志坚
出处 《激光杂志》 北大核心 2015年第6期114-117,共4页 Laser Journal
基金 重庆市科委自然科学基金计划(cstc2011jj A10054)
关键词 激光 蛋白质关联图预测 不均衡数据集 下采样 聚类 Laser Protein contacts map prediction Imbalanced data Under-sampling Cluster
  • 相关文献

参考文献5

  • 1Guang-Zheng Zhang,De-Shuang Huang.Prediction of inter-residue contacts map based on genetic algorithm optimized radial basis function neural network and binary input encoding scheme[J]. Journal of Computer-Aided Molecular Design . 2005 (12)
  • 2Baldi P,Pollastri G,Andersen C A,Brunak S.Matching protein beta-sheet partners by feedforward and recurrent neural networks. Proceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology . 2000
  • 3Seyda Ertekin,Jian Huang,Leon Bottou,Lee Giles.Learning on the border:active learning in imbalanced data classification. Conference on Information and Knowledge Management . 2007
  • 4Guilhem Faure,Aurélie Bornot,Alexandre G. de Brevern.Protein contacts, inter-residue interactions and side-chain modelling. Biochimie . 2008
  • 5http://ch.sysu.edu.cn/bio/Item/985.aspx .

共引文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部