摘要
二硫键是维持蛋白质结构与功能稳定的重要生物特征,先前关于二硫键模式的预测通常为将相关特征进行特征选择并代入机器学习模型,其缺陷在于没有考虑不同特征之间的关联性,该文根据传统的预测方法,在使用费舍得分进行特征选择的基础上,计算特征子空间中各特征的相关度,剔除线性相关度高的特征,利用支持向量回归对处理后的数据进行四重交叉验证,以取得更加理想的效果。
Disulfide connectivity is one of significant protein structural characteristic. Previous prediction methods usuallyused support vector regression,which didnt consider the correlation between different features. According to traditional predictionmethods,based on fisher score,this paper calculated correlation coefficient of each pair of features after feature selection,then de-leted the features with high correlation coefficient. Based on the rest features,support vector regression was used to train model andtest. 4-fold validation was used on our benchmark dataset to gain a hopeful result comparing with previous results.
出处
《计算机与数字工程》
2017年第11期2093-2096,2117,共5页
Computer & Digital Engineering
基金
国家自然科学基金项目(编号:61373062
61371040)资助
关键词
生物信息学
二硫键
支持向量回归
相关系数
特征选择
bioinformatics,disulfide bond,support vector regression,correlation coefficient,feature selection