摘要
基于蛋白质相互作用能量热点的特性,定义了残基接触数、溶剂可及性面积相对变化量所占比例等18个新特征。分别使用基于支持向量机(support vector machine,SVM)和基于F-Score的递归特征消除法进行特征选择,提出对应的预测模型SVM-RFE和F-Score-RFE用于蛋白质能量热点的预测。实验结果显示,在独立测试中F-Score-RFE模型的F1比当前预测性能最好的方法提高6.25%,表明所定义的新特征对蛋白质能量热点的识别具有较大的贡献。
18 new features such as residue contact number and the proportion of relative change of accessible surface area et al. were derived based on the analysis of protein-protein interaction energy hot spots. Two recursion feature elimina-tion methods were used to select discriminative feature subsets and two corresponding prediction models were proposed, noted as SVM - RFE and F - Score - RFE. The experimental results showed that the prediction model F - Score - RFE could improve 6. 25% in the value of F1 compared with the best existing method on the same independent test dataset, which indicated that new features defined were significant to improve the performance of prediction.
出处
《山东大学学报(工学版)》
CAS
北大核心
2014年第2期12-20,共9页
Journal of Shandong University(Engineering Science)
基金
国家自然科学基金资助项目(61173118)
上海市教委曙光计划资助项目(09SG23)
中央高校基本科研业务费专项资助项目(0800219157)
关键词
蛋白质相互作用
能量热点
特征选择
递归消除
预测
protein-protein interaction
energy hot spots
feature selection
recursion eliminate
prediction