期刊文献+

基于最大相关最小冗余朴素贝叶斯分类器的应用 被引量:1

Application of Naive Bayesian Classifier Based on Maximum Relevance Minimum Redundancy Method
下载PDF
导出
摘要 目的将基于最大相关最小冗余(maximum relevance minimum redundancy,MRMR)的朴素贝叶斯分类器(naive bayesian classifier,NBC)应用于基因表达数据并与经典NBC、随机森林(random forests,RF)进行比较。方法采用Matlab与R软件编程,应用结肠癌与肺癌基因表达数据集,分别采用上述三种方法进行比较研究,使用10-折交叉验证方法估计经典NBC与RF的分类准确率。结果应用MRMR-NBC分析结肠癌基因表达数据集显示,采用信息熵(mutual information quotient,M IQ)法,当特征m=11时分类准确率达93.55%;而采用信息差(mutual information difference,M ID)法时,当m=15时分类准确率达到95.16%。应用MRMR-NBC分析肺癌基因表达数据集显示,采用MIQ法,当m=14时分类准确率最高达98.63%,而采用MID法时当m=12时分类准确率达到97.26%。而采用经典NBC分析结肠癌与肺癌基因表达数据时,分类准确率分别为66.67%、80.00%;RF在分析结肠癌与肺癌基因表达数据时,分类准确率分别为81.89%、77.62%。结论 M RM R-NBC能在仅有极少属性参与分类时,得到较高的分类准确率,优于经典NBC与RF。 Objective To apply Naive Bayesian classifier with Maximum Relevance Minimum Redundancy(MRMR) feature selection methods into gene expression data, and to compare it with Naive Bayesian classifier( NBC ) and Random Forests (RF). Methods The three methods were applied to classify the colon and lung genes by Matlab and R software. 10-fold cross-validation was used to estimate the classification accuracy. Results When applying MRMR-NBC method to classify the colon genes,the classification accuracy reached 93.55% with features with mutual information quotient(MIQ) ,95.16% with with mutual information difference(MID). When applying MRMR-NBC method to classify the lung genes ,the classification accura- cy reached 98.63% with with MIQ,97. 26% with with MID. When applying NBC to classify both of the colon and lung genes, the classification accuracy reached 66. 67% and 80. 00% ; when applying Random Forests to classify both of the colon and lung genes,the classification accuracy reached 81.89% and 77.62%. Conclusion The classification accuracy of MRMR-NBC can reach higher than NBC and RF with fewer features.
出处 《中国卫生统计》 CSCD 北大核心 2015年第6期932-934,共3页 Chinese Journal of Health Statistics
基金 国家自然科学基金(81373103) 重庆市科委基础与前沿研究计划项目(cstc2013jcyj A10009)
关键词 最大相关最小冗余 朴素贝叶斯分类器 随机森林 特征选择 Maximum relevance minimum redundancy Naive Bayesian classifier Random forests Feature selection
  • 相关文献

参考文献5

  • 1Peng H, Long F, Ding C. Feature selection based on mutual informa- tion criteria of max-dependency, max-relevance and rain-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005,27 ( 8 ) : 1226-1238.
  • 2武晓岩,李康.基因表达数据判别分析的随机森林方法[J].中国卫生统计,2006,23(6):491-494. 被引量:21
  • 3Caruana R,Niculescu-Mizil A. An empirical comparison of supervised learning algorithms. Proceedings of the 23rd international conference on Machine learning ,2006.
  • 4Alon U, Barkai N, Notterman DA, et al. Broad patterns of gene ex- pression revealed by clustering analysis of tumor and normal colon tis- sues probed by oligonucleotide arrays. Proc Natl AcadSci U S A, 1999,96(12) :6745-6750.
  • 5Gather ME, Troyanskaya OG, Schluens K, et al. Diversity of gene ex- pression in adenocarcinoma of the lung. Proc Natl AcadSci U S A. 2001,98 (24) :13784-13789.

二级参考文献4

  • 1Marko R.Improving Random Forests.Machine Learning.ECML Proceedings,Springer,Berlin,2004.
  • 2Ramón D,Sara Alvarez DA.Gene selection and classification of microarray data using random Forest.BMC Bioinformatics,2006,http://www.biomedcentral.com/1471-2105/7/3.
  • 3Liaw A,Wiener M.Classification and regression by randomForest.Rnews,2002,2:18-22.
  • 4Leo B.Random Forests.Statistics Department University of California Berkeley,CA 94720,January 2001.

共引文献20

同被引文献8

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部