期刊文献+

改进型加权KNN算法的不平衡数据集分类 被引量:26

Classification for Imbalanced Dataset of Improved Weighted KNN Algorithm
下载PDF
导出
摘要 K最邻近(KNN)算法对不平衡数据集进行分类时分类判决总会倾向于多数类。为此,提出一种加权KNN算法GAK-KNN。定义新的权重分配模型,综合考虑类间分布不平衡及类内分布不均匀的不良影响,采用基于遗传算法的K-means算法对训练样本集进行聚类,按照权重分配模型计算各训练样本的权重,通过改进的KNN算法对测试样本进行分类。基于UCI数据集的大量实验结果表明,GAK-KNN算法的识别率和整体性能都优于传统KNN算法及其他改进算法。 Based on analyzing the shortages of K-Nearest Neighbor(KNN) algorithm in solving classification problems on imbalanced dataset,a novel KNN approach based on weight strategy(GAK-KNN) is presented.The key of GAK-KNN lies on defining a new weight assignment model,which can fully take into account the adverse effects caused by the uneven distribution of training sample between classes and within classes.The specific steps are as follows: use K-means algorithm based on Genetic Algorithm(GA) to cluster the training sample set,compute the weight for each training sample in accordance to the clustering results and weight assignment model,use the improved KNN algorithm to classify the test samples.GAK-KNN algorithm can significantly improve the identification rate of the minority samples and overall classification performance.Theoretical analysis and comprehensive experimental results on the UCI dataset con?rm the claims.
出处 《计算机工程》 CAS CSCD 2012年第20期160-163,168,共5页 Computer Engineering
基金 国家自然科学基金资助项目(31170393) 陕西省自然科学基金资助项目(2012JM8023) 陕西省教育厅自然科学专项基金资助项目(12JK0726)
关键词 不平衡数据集 分类 K最邻近算法 权重分配模型 遗传算法 K-MEANS算法 imbalanced dataset classification K-Nearest Neighbor(KNN) algorithm weight assignment model Genetic Algorithm(GA) K-means algorithm
  • 引文网络
  • 相关文献

参考文献7

二级参考文献36

  • 1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:389
  • 2Japkowicz N. Learning from imbalanced data sets: A comparison of various strategies, WS-00-05 [R]. Menlo Park, CA: AAAI Press, 2000
  • 3Chawla N V, Japkowicz N, Kotcz A. Editorial: Special issue on learning from imbalaneed data sets [J]. Sigkdd Explorations Newsletters, 2004, 6( 1 ) : 1-6
  • 4Weiss Gary M. Mining with rarity: A unifying frameworks [J]. SIGKDD Explorations Newsletters, 2004, 6(1): 7-19
  • 5Maloof M A. Learning when data sets are imbalanced and when costs are unequal and unknown [OL]. [2008-01-06]. http://www. site. uottawa. ca/-nat/workshop2003/workshop 2003. html
  • 6Chawla N V, Hall L O, Bowyer K W, et al. SMOTE: Synthetic minority oversampling technique [J]. Journal of Artificial Intelligence Research, 2002, 16 : 321-357
  • 7Jo Taeho, Japkowicz Nathalie. Class imbalances versus small disjunets [J]. SIGKDD Explorations Newsletters, 2004, 6 (1): 40-49
  • 8Batista E A P A, Prati R C, Monard M C. A study of the behavior of several methods for halaneing machine learning training data [J]. SIGKDD Explorations Newsletters, 2004, 6(1): 20-29
  • 9Guo Hongyu, Viktor Herna L. Learning from imbalanced data sets with boosting and data generation: The DataBoostIM approach [J]. SIGKDD Explorations Newsletters, 2004, 6(1): 30-39
  • 10Chawla N V, Lazarevic A, Hall L O, et al. Smoteboost: Improving prediction of the minority class in boosting [C] // Proc of the Seventh European Conf on Principles and Practice of Knowledge Discovery in Databases. Berlin: Springer, 2003:107-119

共引文献127

同被引文献223

引证文献26

二级引证文献164

;
使用帮助 返回顶部