摘要
KNN算法以其稳定性好、准确率高等特点受到了广泛的应用.但其分类效率与训练样本的规模呈正比,在面对大规模、高维度的数据时分类效率较低.一些改进的KNN算法通过聚类对训练样本进行裁切,从而降低训练样本的规模,提高分类效率,然而待测样本近邻点的缺失导致其分类准确率降低.为此,本文提出了一种基于三支聚类的快速KNN算法(TWC-KNN).算法首先通过阴影集对FCM聚类结果的隶属度矩阵进行处理,从而构造三支聚类,将训练样本划分到类簇的核心域、边界域和琐碎域.根据待测样本与类簇中心的位置重新构造训练集,然后进行KNN分类.在8组不同的UCI数据集上进行了相关实验测试,结果表明,相比于传统的KNN算法和3种改进的KNN算法,TWC-KNN在分类准确率和分类效率上均有所提高.
The KNN algorithm has been widely used due to its high stability and accuracy.However Jthe classification efficiency is proportional to the size of training samples.The classification efficiency is relatively low when faced with large-scale or high dimensional data.Existing fast KNN algorithms cut the training samples through clustering to reduce the size of the training samples and improve classification efficiency.However Jthe lack of neighboring points of the sample to be tested leads to decreased classification accuracy.In this paper,we propose a fast KNN algorithm based on three-way clustering(TWC-KNN).The algorithm first processes the membership matrix of the FCM through the shadowed sets and divides the training samples into core region fringe region,and trivial region.Then the training set is reconstructed according to the test sample’s position and the cluster center,then KNN classification is performed.We conduct related experimental tests on eight UCI data sets.The result is that on the KNN algorithm and three improved KNN algorithm Jthe TWC-KNN can improve the classification accuracy and efficiency.
作者
赵书宝
姜春茂
ZHAO Shu-bao;JIANG Chun-mao(College of Computer Science and Information Engineering,Harbin Normal University,Harbin 150025,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2021年第9期1845-1851,共7页
Journal of Chinese Computer Systems
基金
黑龙江省自然科学基金项目(LH2020F031)资助。
关键词
KNN分类
阴影集
三支决策
三支聚类
FCM
KNN classification
shadowed set
three-way decision
three-way clustering
FCM