摘要
采用最大分类树作为分析经验风险与结构风险的工具,对决策树分类准确率极限进行了研究。针对决策树模型的分类效果难以客观评价的问题,讨论了决策树分类准确率极限的存在条件,给出了求出该极限的方法。以最大分类树作为分析工具,提出了在经验风险和结构风险4种分布条件下分类准确率极限是否存在的4个定理,并从机器学习理论和工程建模实践2个角度进行了讨论。实验验证了该理论的正确性。
Taking maximum classification tree as a tool to analyze empirical risk and structural risk, this paper addresses the problem of classification accuracy limit of decision tree. Aiming at the difficulty to estimate the classification effectiveness of decision tree externally, it discusses the existence condition of classification accuracy limit and presents the method to get it. It points out four theorems which demonstrate the existence of classification accuracy limit under four distribution conditions of empirical risk and structural risk with analysis from machine learning theory and practical modeling. The theorems are validated from experiments on ten public datasets.
出处
《计算机工程》
CAS
CSCD
北大核心
2007年第10期222-224,共3页
Computer Engineering
基金
国家自然科学基金资助项目(60432010)
关键词
决策树
分类准确率
极限
经验风险
结构风险
Decision tree
Classification accuracy
Limit
Empirical risk
Structural risk