摘要
提出了一种基于多属性分类的KNN改进算法,可有效提高传统的欧几里德KNN算法和基于信息熵的KNN改进算法的分类准确度.首先,按照单个属性不同属性值的个数占整个属性包含样本的比例进行属性的分类,分为基于信息熵的KNN算法处理的离散属性和基于传统欧几里德KNN相似度处理的连续属性两类,然后分别对不同属性进行区别处理;其次,将两类不同处理后得到的结果按比例求和作为样本之间的距离;最后,选取与待测样本的距离最小的k个样本判断测试样本的决策属性类别.
To improve the classification accuracy of the conventional Euclidean KNN algorithm and the im-proved KNN algorithm based on information entropy,this paper proposes an improved KNN algorithm based on multi-attribute classification. The procedures of the new algorithm comprise:i) classify the attributes according to the percentage of their attribute values in an entire attribute of sample set into those discrete attributes suit-able for entropy-based KNN algorithm and those continuous attributes suitable for conventional Euclidean KNN similarity-based algorithm;ii) process the two types of attributes separately and then sum up the two series of results with weighing and put the sum as the distance between samples;iii) select k samples those are closest to the test sample to determine the decision attribute type of the test sample.
出处
《鞍山师范学院学报》
2013年第6期38-41,59,共5页
Journal of Anshan Normal University
关键词
离散属性
连续属性
KNN算法
多属性分类
Discrete attribute
Continuous attribute
KNN algorithm
Multi-attribute classification