摘要
应用机器学习进行分类是基因功能预测的一种重要手段。但是许多预测集中的阳性样本过少,会降低功能预测的效果。针对此问题,本研究对结合支持向量机(SVM)算法的几种常用非平衡数据分类方法进行实验比较,包括投票整合分类器和移动分类面等。在此基础上提出通过加权修正投票的整合策略,以提高预测效果。实验结果显示,结合多数类样本限数取样及整合思想的投票整合法预测效果优于移动分类面法,而在投票整合法基础上的加权修正整合方法在所有方法中获得更好更稳定的结果。
Classification by machine learning is an important technique to predict gene functions. However, the positive data for many prediction data sets may be rare, which degrades the performance of functional prediction. Combined with support vector machine (SVM) algorithm, several common approaches to deal with the unbalanced problem were compared, including majority voting, the moving boundary surface, etc. We also proposed weighted ensemble strategies instead of simple majority voting to address the unbalanced problem. The experimental results show that the method of voting ensembles, which combines under-sampling majority technique and the ensemble learning idea, has performance superior to the method of moving boundary surface. The weighted strategies based on majority voting can achieve significantly better and more stable performance than that of other methods.
出处
《中国生物医学工程学报》
CAS
CSCD
北大核心
2006年第2期158-162,177,共6页
Chinese Journal of Biomedical Engineering
基金
国家自然科学基金资助项目(39970397
30170515
30370388)
国家"863"计划(2002AA2Z2052
2003AA2Z2051)
黑龙江科技攻关重点(GB03C6024)
黑龙江自然科学基金资助项目(F0177)
哈尔滨市科技攻关(2003AA3CS113)
哈尔滨医科大学211工程"十五"建设项目
关键词
支持向量机
功能预测
基因表达谱
不平衡
support vector machine
functional prediction
gene expression profile
unbalanced