摘要
从粗糙集和决策树两种方法具有的优势互补性出发,提出了一种基于粗糙集和决策树相结合的数据挖掘新方法·以胶合板缺陷检测数据分析为应用对象,利用粗糙集理论对胶合板数据库中的特征信息进行缺陷识别·利用谱系聚类重心距离法对数据进行离散化处理,采用粗糙集进行属性约简,得到低维样本数据,最后用决策树方法产生决策规则·实验证明,这种数据挖掘方法保留了原始数据的内部特点,加快了获取知识的进程,提高了模型的分类准确率,增强了规则的可解释性,取得了满意的研究结果·
Rough sets and decision tree have complementary characteristics. A new approach to data mining is thus proposed combining both advantages. Taking the detected data of plywood defects as example, the defects are recognized as follow using eigen information in the database of plywood on the basis of rough sets theory. Decentralizes the data in the database by the algorithm of center-of-gravity distance of pedigree cluster, then reduces the conditional attribute by use of rough sets to obtain the low dimensional sample data. Decision rules are finally obtained by decision tree. The experimental result shows that, in this way, the original characteristics of data remained unchanged, and the knowledge acquisition process become speedier ,so as to improve the classification accuracy of model and interpretability of rules. Comparing with other the methods, such as rough sets or preelsion-varied rough sets, the method is proved more mtisfactory.
出处
《东北大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2006年第5期481-484,共4页
Journal of Northeastern University(Natural Science)
基金
科技部国际合作重点项目(2003DF020009)
关键词
粗糙集
决策树
数据离散化
数据挖掘
谱系聚类
属性约简
rough sets
decision tree
data decentralization
data mining
pedigree cluster
attribute reduction