摘要
该文研究连续属性的离散化问题。首先,详细介绍了基于熵的离散化算法(EBD),并对其存在的问题进行了分析。随后,给出了用于度量区间密度的定义;接着,在自适应思想的启发下,对EBD算法进行了改进,提出了基于熵的变阀值离散化算法,区间密度的引入使得该算法能够随样本集在区间上密度的变化适当调整熵的阀值。实验结果表明,与EBD算法相比,改进算法不仅保持简单性、一致性和精确性,而且容易操作。
In this paper, discretization methods of continuous attributes are researched. Firsdy, we introduce Entropy-Based Discretization algorithm (EBD) and discuss some limits in it. Secondly, the concepts of density are defined. Then, in the Adaptive idea, we propose a new algorithm based on the EBD algorithm, that can adjust the threshold of entropy according to the variation of the density of sample set. At last, we apply this algorithm to two datasets. Experimental results show that, by comparing with EBD and this algorithm, not only maintains simplicity, consistency and accuracy but also is easily operated.
作者
李朝鹏
成运
LO Chao-peng, CHENG Yun (Hunan University of Humanities, Science and Technology, Loudi 417000, China)
出处
《电脑知识与技术》
2009年第12期9744-9746,共3页
Computer Knowledge and Technology
基金
国家自然科学基金项目(90715029)
湖南省自然科学基金项目(07JJ6116)
湖南省教育项目,湖南省重点建设学科资助,湖南省教育厅科学研究项目(09C546)
关键词
信息熵
自适应
离散化
entropy
adaptive
discretizafion