摘要
针对传统的K近邻算法在计算样本之间相似度时将每个属性视为同等重要的问题,提出了一种基于推土机距离的方法来计算每个条件属性的权重。首先根据近邻关系划分用于比较一致性的两个分布;之后根据推土机距离设计不一致性评价函数,用于衡量每个属性下各个样本的近邻样本集与这一集合由决策属性细化的等价划分之间的不一致性程度;最后将近邻的不一致性程度转换为相应属性的重要性,用于实现属性加权K近邻分类器。通过在多个数据集上进行实验,该方法对参数的敏感程度低,在多个参数下可以显著提高K近邻的分类精度,并且在多个指标下的表现优于现有的一些分类方法。结果表明,该方法可以通过属性加权选择出更加准确的近邻样本,可广泛应用于基于近邻的机器学习方法中。
When calculating the similarity between samples,the conventional KNN algorithm deems each attribute equally important,and ignores the distinction of the attributes’significance.In order to address this issue,this paper employed the earth mover’s distance to calculate the weight of each condition attribute.Firstly,this method divided two distributions according to the nearest neighbor relationship.Then,it designed an evaluation function based on earth mover’s distance to gauge the inconsistency degree between the neighborhood of each sample with regard to each attribute and its equivalent refinement induced by the decision attribute.Last,it transformed the inconsistency degree to the significance of the corresponding attribute to implement an attribute weighted KNN.Through systematic experiments on several datasets,it verifies that the proposed method is insensitive to parameters and can significantly improve the classification performance of KNN,and outperforms some state-of-the-art classification methods.The results show that this method can select more accurate nearest neighbor samples by attribute weighting,and can be widely used in methods based on nearest neighbor.
作者
徐政
邓安生
曲衍鹏
Xu Zheng;Deng Ansheng;Qu Yanpeng(College of Information Science&Technology,Dalian Maritime University,Dalian Liaoning 116026,China)
出处
《计算机应用研究》
CSCD
北大核心
2021年第5期1355-1359,1364,共6页
Application Research of Computers
基金
大连市青年科技之星资助项目(2018RQ70)。
关键词
属性权重
近邻分类
不一致性
推土机距离
weight of attributes
nearest-neighbor classification
inconsistency
earth mover’s distance