期刊文献+

一种改进地标点采样的不平衡数据聚类算法

An Improved Imbalanced Data Clustering Algorithm for Landmark Sampling
下载PDF
导出
摘要 对不平衡数据进行聚类分析时,K-means聚类方法可能会错误地将分布在较小区域类别中的样本划分到大区域类别中;谱聚类算法,虽然可以有效优化数据结构,并很好地识别不同形状的样本,但却难以处理大规模数据.针对这些问题,提出一种改进地标点采样的不平衡数据聚类算法.该算法首先对不平衡数据进行预聚类以获得初始类标签,然后基于数据密度对数据进行采样.在此基础上,通过对采样数据执行K-means聚类,并将聚类中心作为地标点,对数据进行谱聚类分析.实验结果显示,该方法在处理不平衡数据时,不仅能够有效提高样本的聚类准确率,而且能够保证聚类结果的稳定性和精度. When clustering imbalanced data,K-means clustering method can easily misclassify samples distributed in smaller regions into larger regions.While spectral clustering algorithm can effectively optimize the data structure and better identify samples of different shapes,it is difficult to deal with large-scale data.To solve these problems,this paper introduces an improved imbalanced data clustering algorithm with improved landmark sampling.The algorithm first pre-clusters the imbalanced data to obtain the initial class label,then samples the data based on its density distribution and applies K-means clustering on the sample.Finally,it selects the cluster center as the landmark for spectral clustering analysis.The experimental results show that when dealing with imbalanced data,this method can not only improve the clustering accuracy of the sample,but also ensure the stability and accuracy of clustering results.
作者 韩素青 李淑慧 HAN Suqing;LI Shuhui(Department of Computer Science and Technology,Taiyuan Normal University,Jinzhong 030619,China)
出处 《太原师范学院学报(自然科学版)》 2019年第2期34-39,共6页 Journal of Taiyuan Normal University:Natural Science Edition
基金 山西省重点研发计划项目:基于GIS的智能消防远程预警平台研究(201803D121088)
关键词 不平衡数据 谱聚类 地标点采样 奇异值分解 imbalanced data spectral clustering landmark sampling singular value decomposition
  • 相关文献

参考文献6

二级参考文献83

  • 1王俊杰,陈景武.BP神经网络原理及其在医学统计应用中的设计技巧[J].中国卫生统计,2008,25(5):547-549. 被引量:10
  • 2中华医学会糖尿病学分会代谢综合征研究协作组.中华医学会糖尿病学分会关于代谢综合征的建议[J].中国糖尿病杂志,2004,12(3):156-161. 被引量:3042
  • 3Wu Jun-jic, Xiong Hui, Wu Peng, et al. Local decomposition for ram class analysis [ C ]. Conference on Knowledge Discovery in Data,New York,2007:814-823.
  • 4Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique[ J]. Journal of Artificial Intelli- gence and Research,2006,16:321-357.
  • 5Ronaldo C Prati, Gustavo E A P A Batista, Maria Carolina Mo- nard. A study with class imbalance and random sampling for a de- cision trcc learning system [ C ]. IFIP International Federation for Information Processing, Springer Boston, 2008 : 131-140.
  • 6Kubat M, Matwln S. Addressing the course of imbalance training sets: one-sided selection[ C]. Pro of 14th International Conference on Machine Learning ( ICML97 ). Nashville, 1977 : 179-186.
  • 7Tomck I. Two modifications of CNN [ J ]. IEEE Transaction on Systems, Man and Communications, 1976,26( 11 ) :769-772.
  • 8Katia Kermanidis, Manolis Maragoudakis, Nikos Fakotakis, et al. Learning greek verb complements: addressing the class imbalance [ C]. ProcceAings of the 20th International Conference on Computa- tional Linguistics, Morristown,NJ,USA, 2004.
  • 9Show-Jane Yen, Yue-Shi Lee. Under-sampling approaches for im- proving prediction of the minority class in an imbalanced dataset [ A]. l..eeture Notes in Control and Information Sciences [ C ], Springer Berlin/Heidelberg, 2006:731-740.
  • 10Asuncion A, Newnmn D ]. UCI machinv learning repository [ EB/ OL]. http ://www. ics. uci. edu/- mlearn/MI.Repository, hm-d,2007.

共引文献235

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部