用AUC评估分类器的预测性能被引量：2

Using AUC to Evaluate Predictive Performance of Classifiers

下载PDF

导出

摘要准确率一直被作为分类器预测性能的主要评估标准，但是它存在着诸多的缺点和不足。本文将准确率与AUC（the area under the Receiver Operating Characteristic curve）进行了理论上的对比分析，并分别使用AUC和准确率对3种分类学习算法在15个两类数据集上进行了评估。综合理论和实验两个方面的结果，显示了AUC不但优于而且应该替代准确率，成为更好的分类器性能的评估度量。同时，用AUC对3种分类学习算法的重新评估，进一步证实了基于贝叶斯定理的Naive Bayes和TAN-CMI分类算法优于决策树分类算法C4．5。 Accuracy has been used as a main evaluation criterion for predictive performance of classifiers. However, it has many shortcomings and disadvantages. In this paper, we compared accuracy to AUC（ the area under the Receiver Operating Characteristic curve） measure in theory and used respectively AUC and accuracy to evaluate three classification learning algorithms on fifteen binary datasets. Theoretical and experimental results show that AUC is not only a better measure than accuracy but also should replace it in comparing classifiers. Furthermore, using AUC to re-evaluate three classification algorithms shows classification algorithm NaiveBayes and TAN-CMI based on Bayes theorem are better than decision tree classification algorithm CA.5 in performance.

作者杨波程泽凯秦锋

机构地区安徽工业大学计算机学院通化师范学院计算机系

出处《情报学报》 CSSCI 北大核心 2007年第2期275-279,共5页 Journal of the China Society for Scientific and Technical Information

关键词 ROC AUC 准确率交叉验证 ROC, AUC, accuracy, cross-validation

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献11

1Provost F,Fawcett T,Kohavi R.The case against accuracy estimation for comparing induction algorithms[C]∥Proceedings of 15th International Conference on Machine Learning.Morgan Kaufmann,1998:445-453.
2Provost F,Fawcett T.Analysis and visualization of classifier performance:Comparison under imprecise class and cost distribution[C]∥Proceedings of 3rd International Conference on Knowledge Discovery and Data Mining (KDD-97).Menlo Park,CA:AAAI Press,1997:43-48.
3Ling C X,Huang J,Zhang H.AUC:a statistically consistent and more discriminating measure than accuracy[C]∥Proceedings of 18th International Joint Conference on Artificial Intelligence (IJCAI-03).Acapulco,Mexico,2003.
4Hanley J A,McNeil B J.The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology[J].1982,143:29-36.
5Bradley A P.The use of the area under the ROC curve in the evaluation of machine learning algorithms[J].Pattern Recognition,1997,30:1145-1159.
6Hand D J,Till R J.A simple generalisation of the area under the ROC curve for multiple class classification problems[J].Machine Learning,2001,45:171-186.
7Wu S M,Flach P.Scored and weighted AUC metrics for classifier evaluation and selection[C]∥Proceedings of 2nd Workshop on ROC Analysis in Machine Learning (ROCML-05).Bonn,Germany,2005.
8Friedman N,Geiger D,Goldszmidt M.Bayesian network classifiers[J].Machine Learning,1997,29:131-163.
9Quinlan J R.C4.5:Programs for Machine Learning.San Francisco:Morgan Kaufmann Publishers,1993.
10Blake C,Merz C.UCI Repository of Machine Learning Databases[OL].[1998].http://www.ics.uci.edu/-mlearn/ MLRepository.html.

同被引文献16

1程泽凯,林士敏,陆玉昌,蒋望东,陆小艺.基于Matlab的贝叶斯分类器实验平台MBNC[J].复旦学报（自然科学版）,2004,43(5):729-732. 被引量：27
2柏延臣,王劲峰.遥感数据专题分类不确定性评价研究:进展、问题与展望[J].地球科学进展,2005,20(11):1218-1225. 被引量：30
3Fawcett T. Roc Graphs: Notes and Practical Consideratiom for Researchers[ R]. Palo Alto, CA: HP Laboratories, 2004.
4Bradley A P. The use of the area under the ROC curve in the evaluation of machine learning algorithms[J ]. Pattern Recognition Society, 1997,30:1145 - 1159.
5Bohanec M.UCI [DB/OL]. 1997 -06 -01. hrtp://www.its. uci. edu/mleam/MLRepository. Html.
6Provost F, Fawcett T. Analysis and Visualization of Classifier Performance: Comparison Under Imprecise Class and Cost Distributions[C]//In Proe. Third Intl. Conf, Knowledge Discovery and Data Mining (KDD - 97). Menlo Park, CA. AAAI Press, 1997-43 - 48.
7Han J W Kamber M 范明孟小峰译.数据挖掘概念与技术[M].北京:机械工业出版杜,2001.147-158.
8张景雄,Michael F Goodchild.野外空间采样的渐进式策略[J].武汉大学学报（信息科学版）,2008,33(5):441-445. 被引量：7
9陈志强,陈健飞.福建土地利用/覆被人为影响指数及其变化的地统计学分析[J].资源科学,2008,30(11):1700-1705. 被引量：8
10农宇,王坤,杜清运.利用多分类Logistic回归进行土地利用变化模拟——以湖北省嘉鱼县为例[J].武汉大学学报（信息科学版）,2011,36(6):743-746. 被引量：12

引证文献2

1邹洪侠,秦锋,程泽凯,王晓宇.二类分类器的ROC曲线生成算法[J].计算机技术与发展,2009,19(6):109-112. 被引量：32
2梅莹莹,张景雄.土地覆盖变化信息自适应抽样策略及其精度评估[J].测绘学报,2018,47(5):644-651. 被引量：2

二级引证文献34

1Yiwen Wang.A Semantic Ontology Structure-based Approach for Retrieving Similar Medical Images[J].Chinese Journal of Biomedical Engineering(English Edition),2020,29(4):11-19.
2樊勇,何宗宜,李敏敏,贺彪.1980—2015年中国建设用地变化研究[J].测绘通报,2020(1):128-131. 被引量：7
3韦修喜,周永权.基于ROC曲线的两类分类问题性能评估方法[J].计算机技术与发展,2010,20(11):47-50. 被引量：21
4王涛,刘渊,谢振平.一种基于飘动性分析的视频烟雾检测新方法[J].电子与信息学报,2011,33(5):1024-1029. 被引量：19
5姜明新,王洪玉,蔡兴洋.基于码本模型和多特征的早期烟雾检测[J].中国图象图形学报,2012,17(9):1102-1108. 被引量：18
6吴石,林连冬,肖飞,渠达.基于多类超球支持向量机的铣削颤振预测方法[J].仪器仪表学报,2012,33(11):2414-2421. 被引量：13
7刘波,林焰,王运龙.水下图像边缘特征提取的BEMD自适应算法[J].哈尔滨工业大学学报,2013,45(2):117-122. 被引量：7
8GUO Zhixing,FANG Weihua,TAN Jun,SHI Xianwu.A Time-dependent Stochastic Grassland Fire Ignition Probability Model for Hulun Buir Grassland of China[J].Chinese Geographical Science,2013,23(4):445-459. 被引量：5
9丁学东,刘渊,谢振平.增量学习语义属性的图像内容检索系统增强[J].计算机应用研究,2014,31(1):273-276. 被引量：5
10余克强,赵艳茹,李晓丽,张淑娟,何勇.基于高光谱成像技术的鲜枣裂纹的识别研究[J].光谱学与光谱分析,2014,34(2):532-537. 被引量：7

1张琦,吴斌,王柏.非平衡数据训练方法概述[J].计算机科学,2005,32(10):181-186. 被引量：10
2秦锋,杨波,程泽凯.分类器性能评价标准研究[J].计算机技术与发展,2006,16(10):85-88. 被引量：26
3王儒敬,葛运健,滕明贵,张晓明.基于粗集的空间对象分类学习算法[J].中国科学技术大学学报,2006,36(2):163-169. 被引量：2
4陈毅松,汪国平,董士海.基于支持向量机的渐进直推式分类学习算法[J].软件学报,2003,14(3):451-460. 被引量：88
5荣明军.一种文字处理文档精确兼容度量方案[J].北京信息科技大学学报（自然科学版）,2010,25(S2):66-69.
6朱美琳,杨佩.半监督支持向量机的多分类学习算法[J].郑州大学学报（理学版）,2008,40(4):35-38. 被引量：4
7张兆宁,孙雅明,毛鹏.基于硬限幅功能函数的前向神经网络的分类学习算法(英文)[J].Transactions of Tianjin University,1999,5(2):14-18.
8武永成.非平衡数据分类算法研究[J].软件导刊,2014,13(2):67-68. 被引量：1
9贾志洋,高炜.基于核方法的半监督超图顶点分类算法分析[J].云南师范大学学报（自然科学版）,2013,33(1):46-49. 被引量：1
10张健沛,姜延良.一种基于节点相似性的链接预测算法[J].中国科技论文,2013,8(7):659-662. 被引量：7

情报学报

2007年第2期

浏览历史

内容加载中请稍等...

用AUC评估分类器的预测性能被引量：2

参考文献11

同被引文献16

引证文献2

二级引证文献34

相关作者

相关机构

相关主题

浏览历史

用AUC评估分类器的预测性能 被引量：2

参考文献11

同被引文献16

引证文献2

二级引证文献34

相关作者

相关机构

相关主题

浏览历史

用AUC评估分类器的预测性能被引量：2