摘要
k-近邻(k-Nearest Neighbour,KNN)算法是一种有效的多分类算法,他具有简单、稳定的特点,在数据挖掘领域得到了广泛的应用。但是有2个主要缺点,一是算法的准确度与k值有很大关系,不同的k值会导致准确率有很大的不同;二是他属于非增量式算法,随着数据量的增加,算法的分类速度会越来越慢,影响他在海量数据分析中的应用。三支决策的主要思想是将整体分成3个独立的部分,引入了不承诺的决策选项,规避了错误接受或者错误拒绝的损失。把三支决策思想引入KNN算法,对边界域样本特殊处理,会减小分类代价,提高海量数据处理的正确性,同时对KNN算法进行改进,提出了一种基于三枝决策的KNN增量式算法,提高了原有算法的快速性。
k-nearest neighbour(KNN)algorithm is an effective multi-classification algorithm which is simple and stable,it has been widely used in data mining area.However,KNN algorithm has two major drawbacks:one is that the choice of k value has an great influence on the accuracy of the algorithm,different k value leads to different accuracy rate;The second,KNN is an non-incremental algorithm,with the rapid increase of data amount,the classification speed of the algorithm will decreased dramatically,which will affect its application in massive data analysis.The main idea of the three-way decisions is to divide the whole into three separate parts,the non-commitment decision-making options is introduced to avoid the wrong acceptance or false rejection losses.Three-way decisions is introduced into the KNN algorithm,the data in boundary region are processed specially,which reduce the classification cost and improve the classification accuracy.An improved incremental KNN algorithm based on three-way decision is proposed in this paper,experiments results show its efficiency.
作者
裴晓鹏
尚奥
刘美红
刘帆
陈泽华
PEI Xiao-peng;SHANG Ao;LIU Mei-hong;LIU Fan;CHEN Ze-hua(College of Information Engineering,Taiyuan University of Technology,Jinzhong 030600,China;Department of Automation,Shanxi University,Taiyuan 030000,China)
出处
《控制工程》
CSCD
北大核心
2020年第4期656-661,共6页
Control Engineering of China
基金
国家自然科学基金资助项目(61402319,61403273)
山西省自然科学基金项目(2014021022-4)。
关键词
K-近邻算法
三支决策
边界域
增量式算法
k-nearest neighbour(KNN)algorithm
three-way decisions
boundary region
incremental algorithm