期刊文献+

基因功能预测问题中的样本不平衡处理 被引量:5

Dealing with Unbalanced Data in Prediction of Gene Functions
下载PDF
导出
摘要 应用机器学习进行分类是基因功能预测的一种重要手段。但是许多预测集中的阳性样本过少,会降低功能预测的效果。针对此问题,本研究对结合支持向量机(SVM)算法的几种常用非平衡数据分类方法进行实验比较,包括投票整合分类器和移动分类面等。在此基础上提出通过加权修正投票的整合策略,以提高预测效果。实验结果显示,结合多数类样本限数取样及整合思想的投票整合法预测效果优于移动分类面法,而在投票整合法基础上的加权修正整合方法在所有方法中获得更好更稳定的结果。 Classification by machine learning is an important technique to predict gene functions. However, the positive data for many prediction data sets may be rare, which degrades the performance of functional prediction. Combined with support vector machine (SVM) algorithm, several common approaches to deal with the unbalanced problem were compared, including majority voting, the moving boundary surface, etc. We also proposed weighted ensemble strategies instead of simple majority voting to address the unbalanced problem. The experimental results show that the method of voting ensembles, which combines under-sampling majority technique and the ensemble learning idea, has performance superior to the method of moving boundary surface. The weighted strategies based on majority voting can achieve significantly better and more stable performance than that of other methods.
出处 《中国生物医学工程学报》 CAS CSCD 北大核心 2006年第2期158-162,177,共6页 Chinese Journal of Biomedical Engineering
基金 国家自然科学基金资助项目(39970397 30170515 30370388) 国家"863"计划(2002AA2Z2052 2003AA2Z2051) 黑龙江科技攻关重点(GB03C6024) 黑龙江自然科学基金资助项目(F0177) 哈尔滨市科技攻关(2003AA3CS113) 哈尔滨医科大学211工程"十五"建设项目
关键词 支持向量机 功能预测 基因表达谱 不平衡 support vector machine functional prediction gene expression profile unbalanced
  • 相关文献

参考文献15

  • 1Zhu T.Global analysis of gene expression using GeneChip microarrays[J].Curr Opin Plant Bio1.2003,6(5):418-425.
  • 2Guo Z,Zhang T,Li X,et al.Towards precise classification of cancers based on robust gene functional expression profiles[J].BMC Bioinformatics,2005,6(1):58.
  • 3Tu K,Yu H,Guo Z,et al.Learnability-based further prediction of gene functions in Gene Ontology[J].Genomics,2004,922-928.
  • 4Chang E,Goh K,Cheng K.T.SVM binary classifier ensembles for multi-class image classification[C].In ACM International Conference on Information and Knowledgment Management (CIKM).Atlanta,2001.395-402.
  • 5Mateos A,Dopazo J,Jansen R,et al.Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons[J].Genome Res.2002,12(11):1703-1715.
  • 6Fawcett T,Provost F.Combining Data Mining and Machine Learning for Effective User Profiling[C].In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.Portland,1996.126-133.
  • 7Ling C,Li C.Data mining for direct marketing problems and solutions[C].In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining.New York,1998.73-79.
  • 8Kubat M,Matwin S.Addressing the curse of imbalanced training sets:one-sided selection[C].Proceedings of the 14th International Conference on Machine Learning.Nashville,Tennesse,1997.179-186.
  • 9Brown MP,Grundy WN,Lin D et al.Knowledge-based Analysis of Microarray Gene Expression Data Using Support Vector Machines[J].Proc Natl Acad Sci U S A.2000,97(1):262-267.
  • 10Yan R,Liu Y,Jin R,et al.On predicting rare classes with SVM ensembles in scene classification[C].In IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).Hong Kong,2003.21-24.

同被引文献46

  • 1张敏,朱晶,郭政,李霞,杨达,王磊,饶绍奇.利用亚细胞位置特异的基因功能模块与表达调控网络识别疾病特征基因[J].科学通报,2006,51(13):1545-1551. 被引量:3
  • 2高磊,李霞,郭政,朱明珠,李彦辉,饶绍奇.结合蛋白质互作与基因表达谱信息大范围预测蛋白质的精细功能[J].中国科学(C辑),2006,36(5):441-450. 被引量:8
  • 3George RA, Liu JY, Feng LL, et al. Analysis of protein sequence and interaction data for candidate disease gene prediction[J]. Nucleic Acids Res, 2006, 34(19): e130.
  • 4Nabieva E, Jim K, Agarwal A, et al. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps [J]. Bioinformatics, 2005, 21 Suppl 1 : i302 - i310.
  • 5Vazquez A, Flammini A, Maritan A, et al. Global protein function prediction from protein-protein interaction networks [ J ]. Nat Biotechnol, 2003, 21(6): 697- 700.
  • 6Deng Minghua, Zhang Kui, Mehta S, et al. Prediction of protein function using protein-protein interaction data[ J]. J Comput Biol, 2003, 10(6) : 947 - 960.
  • 7Letovsky S, Kasif S. Predicting protein function from protein/protein interaction data: a probabilistic approach[J]. Bioinformtics, 2003, 19 :i197 - i204.
  • 8Yu Hul, Gao Lei, Tu Kang, et al. Broadly predicting specific gene functions with expression similarity and taxonomy similarity [ J ]. Gene, 2005, 352: 75- 81.
  • 9Zhu Mingzhu, Gao Lei, Guo Zheng, et al. Globally Predicting protein functions based on co-expressed protein-protein interaction networks and ontology taxonomy similarities[ J]. Gene, 2007,391( 1 - 2) :113- 119.
  • 10Mateos A, Dopazo J, Jansen R, et al. Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons[J]. Genome Res, 2002, 12(11): 1703- 1715.

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部