摘要
设计了一种新的蛋白质亚细胞定位预测训练集构造方法.该方法针对传统预测方法缺乏足够的实验标记数据的问题,基于主动学习策略从非实验标记蛋白质数据中主动选择有效数据,并与原有的实验标记数据共同训练预测模型,以提高基准分类器的预测精度.结合支持向量机分类器,该方法在病毒蛋白质独立测试集上进行了预测实验,测试结果表明,该方法能够有效地提高基准分类器的预测能力,性能优于现有的病毒蛋白质预测系统.
A novel method of training set construction for predicting protein subcellular localization is designed.The proposed approach aims at the shortage of valid data with experimental annotations in the traditional methods,and employs active learning strategy to actively select some effective data with non-experimental annotations which train the prediction model together with the original data,and also improves the prediction accuracy of the benchmark classifier.Combined with a support vector machine classifier,the approach is tested on an independent testing dataset of viral proteins.The test results indicate that the presented method can effectively improve the predicting capability of the benchmark classifier and can achieve a better performance than the existing predicting systems for viral proteins.
出处
《大连理工大学学报》
EI
CAS
CSCD
北大核心
2012年第6期884-889,共6页
Journal of Dalian University of Technology
基金
国家自然科学基金资助项目(61174027)
国家科技重大专项资助项目(2010ZX04007-011-5)
辽宁省自然科学基金资助项目(20102025)
关键词
生物信息学
机器学习
蛋白质亚细胞定位
非实验标记数据
主动学习
bioinformatics
machine learning
protein subcellular localization
data with nonexperimental annotations
active learning