摘要
分析了针对连续属性样本进行数据挖掘的缺陷,提出一种直接对连续属性样本进行分类规则挖掘的算法。它基于样本属性值分割点对实例样本进行分类,把分割点对实例样本的分类能力作为分割点选择的依据,将所有相容样本划分为分类属性值相同的子集作为停机条件,实现连续属性样本分类规则挖掘的完全自动化。它考虑到数据挖掘的目标和要求,充分利用属性与类间的依赖性、属性间的互补性,达到样本分割点数少、分类规则简单和属性约减的目的。最后通过实例进行了验证,并与C4.5算法进行了比较。
The paper analyses the shortcoming in new classification rules mining about continuous valued attributes, and proposes a new algorithms dealing with continuous valued attributes mining. It mines classification rules, by the way of judging the splitting point in classifying, selecting the best one to classify, when the class label of all subclass in consistent samples are sameness, then end. The algorithm considers the aim and demand of data mining, makes the most of the interdependence between class labels and attributes, among the attributes, in the interest of minimizing the number of splitting point, simplifying classification rules, reducing the number of features. Finally, the algorithm is validated by an example, compared with C4.5.
出处
《计算机工程》
EI
CAS
CSCD
北大核心
2005年第18期28-30,共3页
Computer Engineering
基金
陕西省自然科学基金资助项目(200104-G15)
关键词
连续属性
数据挖掘
分类规则
新算法
Continuous valued attributes
Data mining
Classification rules
New algorithms