摘要
许多真实世界的数据集都存在一个称为类不平衡问题的问题。传统的分类算法在对不平衡数据进行分类时,容易导致少数类被错分。为了提高少数类样本的分类准确度,提出了一种基于固定半径最近邻的逐步竞争算法(FRNNPC),通过固定半径邻(FRNN)对数据集进行预处理,在全局范围内消除不必要的数据,在得到的候选数据中使用逐步竞争算法(NPC),即逐渐计算查询样本邻近样本的分值,直到一个类的分值总和高于另一个类。简而言之,该方法能够有效地处理不平衡问题,而且不需要任何手动设置的参数。实验结果将所提出的方法与4种代表性算法在10个不平衡数据集上进行了比较,并验证了该算法的有效性。
There is a problem called class imbalance in many real-world datasets. When traditional classification algorithms classifying imbalanced data, it is easy to misclassify the minority class. In order to improve the classification accuracy of the samples from the minority class, this paper proposes a fixed-radius nearest neighbor progressive competition algorithm(FRNNPC). As a preconditioning, FRNNPC eliminates ineligible samples globally through the fixed-radius nearest neighbor rule, and use the NPC in the obtained candidate data to gradually calculate the score of the nearest neighbor sample of the query sample until the sum of the scores of the one class is higher than another class. In short, this method can effectively deal with the imbalance problem, and does not require any manually set parameters. The experimental results compare the proposed method with four representative algorithms applied to 10 imbalanced data sets, and illustrate the effectiveness of the algorithm.
作者
周鹏
伊静
朱振方
刘培玉
ZHOU Peng;YI Jing;ZHU Zhen-fang;LIU Pei-yu(School of Information Science & Engineering, Shandong Normal University, Jinan 250358, Shandong, China;Shandong ProvincialKey Laboratory for Distributed Computer Software Novel Technology, Jinan 250358, Shandong, China;School of Computer Science & Technology, Shandong Jianzhu University, Jinan 250014, Shandong, China;School of Information Science and Electric Engineering, Shandong Jiaotong University, Jinan 250357, Shandong, China)
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2019年第3期102-109,共8页
Journal of Shandong University(Natural Science)
基金
国家自然科学基金资助项目(61373148
61502151)
教育部人文社科基金资助项目(14YJC860042)
山东省自然科学基金资助项目(ZR2014FL010)
关键词
不平衡数据
最近邻规则
模式分类
imbalanced data
nearest neighbors rule
pattern classification