基于激光解析技术在蛋白质关联图预测问题数据集不均衡的研究

Study of Protein Contacts Map Prediction on Imbalanced Data

下载PDF

导出

摘要随着融合了激光解析等新技术的蛋白质自动测序技术发展,蛋白质序列越来越容易获得,如何通过蛋白质序列预测其结构成为重要研究问题。蛋白质关联图预测是蛋白质三级结构预测的中间步骤,是典型的数据集极度不均衡的分类问题,非关联类别数据远远多于关联类别数据。与文本分类等问题不同,蛋白质关联图预测问题的特征维数不高,因而不能从特征选择上进行数据集优化。为了有效减少多数类样本的规模,提出结合聚类的数据下采样预处理方法,使关联和非关联类别的分布趋于平衡。实验表明,支持向量机方法在优化后的蛋白质数据集可以有效实现数据分类。 With the development of automatic protein sequencing which integrating the new technologies such as laser analysis,protein sequences are more and more easily obtained,and prediction of protein structures based on sequences becomes an important research problem. Prediction of protein inter- residue contacts map is one of the most important intermediate steps to the protein structure study,and it is a typically class imbalance problems,and the amino acid residue pairs in contact are far more than pairs not in contact. Unlike text classification problems,feature dimensionality is not high in protein contacts map prediction,so the optimistic feature selection methods is not viable. In order to reduce the size of majority class,a new method of under- sampling based on clustering is proposed to balancing the dataset. Experimental results show that Support Vector Machine which combined the proposed method can predict protein contacts map effectively.

作者刘君宋志坚

机构地区重庆交通大学信息科学与工程学院

出处《激光杂志》北大核心 2015年第6期114-117,共4页 Laser Journal

基金重庆市科委自然科学基金计划(cstc2011jj A10054)

关键词激光蛋白质关联图预测不均衡数据集下采样聚类 Laser Protein contacts map prediction Imbalanced data Under-sampling Cluster

分类号 Q51 [生物学—生物化学] TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献5

1Guang-Zheng Zhang,De-Shuang Huang.Prediction of inter-residue contacts map based on genetic algorithm optimized radial basis function neural network and binary input encoding scheme[J]. Journal of Computer-Aided Molecular Design . 2005 (12)
2Baldi P,Pollastri G,Andersen C A,Brunak S.Matching protein beta-sheet partners by feedforward and recurrent neural networks. Proceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology . 2000
3Seyda Ertekin,Jian Huang,Leon Bottou,Lee Giles.Learning on the border:active learning in imbalanced data classification. Conference on Information and Knowledge Management . 2007
4Guilhem Faure,Aurélie Bornot,Alexandre G. de Brevern.Protein contacts, inter-residue interactions and side-chain modelling. Biochimie . 2008
5http://ch.sysu.edu.cn/bio/Item/985.aspx .

共引文献1

1刘君.激光解析蛋白质数据结合Hadoop的预处理方法[J].激光杂志,2015,36(7):121-123. 被引量：1

1刘桂霞,吕晓枫,徐春艳,周春光.暂态混沌神经网络在蛋白质关联图预测中应用研究[J].小型微型计算机系统,2008,29(7):1291-1295. 被引量：2
2刘桂霞,于哲舟,周春光.基于带偏差递归神经网络蛋白质关联图的预测[J].吉林大学学报（理学版）,2008,46(2):265-270. 被引量：1
3刘桂霞,王荣兴,黄岚,于哲舟,周春光.基于改进克隆选择算法的蛋白质关联图预测[J].吉林大学学报（工学版）,2009,39(5):1303-1308. 被引量：1
4柳菊霞,苏靖枫.基于离散对数的代理盲签名方案[J].计算机应用,2010,30(8):2167-2169. 被引量：4
5何俊杰,孙芳,祁传达.一个代理盲签名方案的安全性分析[J].计算机应用研究,2012,29(5):1904-1906. 被引量：1
6李村合,冯静.一种改进的KNN网页分类算法[J].微计算机应用,2008,29(3):21-25. 被引量：3
7崔强.一个安全的群签名方案[J].山东建筑大学学报,2007,22(3):260-262. 被引量：2
8张树有,纪杨建,谭建荣,彭群生.非关联尺寸标注干涉的自适应处理[J].浙江大学学报（工学版）,2001,35(6):676-680. 被引量：8
9行业数字[J].互联网周刊,2011(12):16-16.
10柳菊霞,苏靖枫.基于离散对数的代理盲签名方案[J].计算机工程与应用,2010,46(25):94-96. 被引量：7

激光杂志

2015年第6期

浏览历史

内容加载中请稍等...

基于激光解析技术在蛋白质关联图预测问题数据集不均衡的研究

参考文献5

共引文献1

相关作者

相关机构

相关主题

浏览历史