摘要
粗糙集是处理不精确、不确定性问题的基本方法之一。采用粗糙集理论与方法进行数据分析具有不必具备数据集的先验知识、不需人为设定参数等优点,因而它被广泛应用于模式识别与数据挖掘领域。针对粗糙集训练过程中从未遇到过的样本的分类问题进行了探讨,根据条件属性的重要性确定加权系数,采用加权KNN的方法来解决无法与决策规则精确匹配的样本分类问题,并与加权最小距离方法进行了对比实验;同时对其他一些现有的粗糙集值约简算法进行了分析与研究,提出了不同的观点。对UCI多个数据集的大量数据进行了实验,并与近期文献中的多种算法进行了性能对比,实验结果表明,提出的算法的总体效果优于其他算法。
Rough set is one of the basic methods in dealing with the imprecise or indefinite problems. For its advantages that the priori knowledge about analyzing dataset isn't necessary and the parameters analysis needn't to be set artifi- cially, rough set is widely used in pattern recognition and data mining fields. For rough set theory, a core problem is how to classify the sample which has never been met in the process of training. This problem was discussed in detail in this paper. According to the importance of the condition attributes, a weighted KNN algorithm was proposed to classify the samples which can' t precisely match to decision rules, and the contrast test with the weighted minimum distance (WMD) method was made to show the efficiency of our algorithm. At the same time, the existing algorithms about the attribute value reduction in rough set were analyzed and another point of view was put forward. The experiments on several UCI data sets and comparison with various existing algorithms proposed recently show that our algorithm is su- perior to these algorithms in overall effect.
出处
《计算机科学》
CSCD
北大核心
2015年第10期281-286,共6页
Computer Science
基金
国家自然科学基金地区项目(61165009)
国家自然科学基金(61365009)资助
关键词
粗糙集
加权KNN
加权最小距离
属性值约简
Rough set,Weighted KNN,Weighted minimal distance,Attribute value reduction