基于Hub的高维数据初始聚类中心的选择策略被引量：3

Hub-Based Initialization for K-hubs

下载PDF

导出

摘要针对基于Hub的聚类算法K-hubs算法存在对初始聚类中心敏感的问题,提出一种基于Hub的初始中心选择策略.该策略充分利用高维数据普遍存在的Hubness现象,选择相距最远的K个Hub点作为初始的聚类中心.实验表明采用该策略的K-hubs算法与原来采用随机初始中心的K-hubs算法相比,前者拥有较好的初始中心分布,能够提高聚类准确率,而且初始中心所在的位置倾向于接近最终簇中心,有利于加快算法收敛. K-hubs is a Hub-based clustering algorithm that is very sensitive to initialization. Therefore, this paper proposes an initialization method based on Hub to solve this problem. The initialization method takes full use of the feature of the Hubness phenomenon by selecting initial centers that are the most remote Hub points with each other. The experimental results show that compared with the random initialization of ordinary K-hubs algorithm, the proposed initialization method can obtain a better distribution of initial centers, which could enhance the clustering accuracy; moreover, the selected initial centers cart appear near the cluster centers, which could speed up the convergence of the clustering algorithm.

作者张巧达何振峰

机构地区福州大学数学与计算机科学学院

出处《计算机系统应用》 2015年第4期171-175,共5页 Computer Systems & Applications

关键词 Hubness 初始中心最大最小距离方法高维数据聚类 Hubness initial center maximm method high-dimensional data clustering

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献11

1Han J, Kamber M. Data Mining: Concepts and Techniques. 2nd ed, Morgan Kaufmann Publishers, 2006.
2Radovanovic M, Nanopoulos A, Ivanovic M. Nearest neighbors in high-dmensional data: The emergence and influence of hubs. Proc. of the 26th International Conference on Machine Learning(ICML). 2009. 865-872.
3Radovanovic M, Nanopoulos A, Ivanovic M. Hubs in space: popular nearestneighbors in high-dimensional data. Journal of Machine Learning Research 9999, 2010: 2487-2531.
4Tomasev N, Mladenic D. Nearest neighbor voting in high dimensional data: Learning form past occurrences. Conputer Science and Information Systems, 9,2012: 691-712.
5Tomasev N, Mladenic D. Hubness-aware shared neighbor distances for high-dimensional k-nearest neighbor classification. Hybrid Aritificial Intelligent Systems, 2012: 116-127.
6翟婷婷,何振峰.基于Hubness的类别均衡的时间序列实例选择算法[J].计算机应用,2012,32(11):3034-3037. 被引量：2
7Zhai TT, He ZF. Instance selection for time series classification based on immune binary particle swarm optimization. Knowledge-Based Systems. 2013, 49(9): 106-115.
8Tomasev N, Radovanovic M, Mladenic D, Ivanovic M. The role of hubness in clustering high-dimensional data. Advances in Knowledge Discovery and Data Mining, 2011: 183-195.
9Aggarwal CC, Hinneburg A, Kein DA. On the surprising behavior of distance metrics in high dimensional spaces. Proc. of the 8th International Conference on Database Theory (ICDT). 2001. 420-434.
10Beyer K, Goldstein J, Ramakrishnan R, et al. When is "nearest neighbor" meaningful? Proc. of the 7th International Conference on Database Theory(ICDT). 1999. 217-235.

二级参考文献14

1杨一鸣,潘嵘,潘嘉林,杨强,李磊.时间序列分类问题的算法比较[J].计算机学报,2007,30(8):1259-1266. 被引量：40
2DING H, TRAJCEVSKI G, SCHEUERMANN P, et al. Querying and mining of time series data: Experimental comparison of representations and distance measures[ J]. Proceedings of the VLDB Endowment, 2008, 1(2) : 1542 - 1552.
3CZARNOWSKI I. Cluster-based instance selection for machine classification[J]. Knowledge and Information Systems, 2012, 30(1): 113 -133.
4GARCIA S, DERRAC J, CANO J R, et al. Prototype selection for nearest neighbor classification: taxonomy and empirical study[ J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(3): 417-435.
5XI X, KEOGH E, SHELTON C, et al. Fast time series classification using numerosity reduction[ C] // Proceedings of the 23rd International Conference on Machine learning. New York: ACM, 2006: 1033 - 1040.
6BUZA K, NANOPOULOS A, SCHMIDT-THIEME L. INSIGHT: Efficient and effective instance selection for time-series classification [ C]// Proceedings of 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, LNCS 6635. Berlin: Springer, 2011: 149 - 160.
7RADOVANOVIC M, NANOPOULOS A, IVANOVIC M. Time series classification in many intrinsic dimensions[ C]//Proceedings of SIAM International Conference on Data Mining. New York: ACM, 2010:677 - 688.
8RADOVANOVIC M, NANOPOULOS A, IVANOVIC M. Nearest neighbors in high-dimensional data: The emergence and influence of hubs[ C]///Proceedings of the 26th Annual International Conference on Machine Learning. New York: ACM, 2009:865 -872.
9RADOVANOVIC M, NANOPOULOS A, IVANOVIC M. Hubs in space: popular nearest neighbors in high-dimensional data[ J]. The Journal of Machine Learning Research, 2010, 11(3) : 2487 - 2531.
10SUN YANMIN, KAMEL M S, WONG A K C, et al. Cost-sensitive boosting for classification of imbalanced data[ J]. Pattern Recognition, 2007, 40(12) : 3358 -3378.

共引文献1

1李金孟,林亚平,祝团飞.基于Hubness与类加权的k最近邻分类算法[J].计算机工程,2018,44(4):248-252. 被引量：6

同被引文献30

1汪仁红,王家伟,梁宗保.基于投影和密度的高维数据流聚类算法[J].重庆交通大学学报（自然科学版）,2013,32(4):725-728. 被引量：1
2何振峰,熊范纶.结合限制的分隔模型及K-Means算法[J].软件学报,2005,16(5):799-809. 被引量：23
3中国电机工程学会电力信息化专委会.中国电力大数据发展白皮书[R].北京:中国电机工程学会电力信息化专委会,2013.
4YAOYP.聚类分析中几种算法的比较[EB/OL].(2011-03-27)[2015-08-08].http://blog.csdn.net/yaoyepeng/article/details/6281991,2015-08-08.
5Johnho.聚类算法总结[EB/OL].(2013-06-06)[2015-08-08].http://blog.chinaunix.net/uid-10289334-id-3758310.html.2015-08-08.
6DONOHO D L. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality [Z]. Los Angeles : Aide-Memoire of the lecture in AMS conference of the 21st Century, 2000.
7VERLEYSEN M. Learning High-dimensional Data[Z]. Siena: Limi- tations and Future Trends in Neural Computation, 2003: 141-162.
8Donoho DL. High-dimensional data analysis:The curses and blessings of dimensionality. AMS Math Challenges Lecture, 2000: 1-32.
9Radovanovic M, Nanopoulos A, Ivanovic M. Nearest neighbour in high-dimensional data: The emergence and influence of hubs. Proc. of 26th Annual International Conference on Machine Learning(ICML), 2009:865-872.
10Radovanovic M, Nanopoulos A, Ivanovic M. Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 2010, 11:2487-2531.

引证文献3

1于君,范文彬,杜永军.智能电网中高维数据聚类方法研究[J].智能计算机与应用,2016,6(1):9-12. 被引量：5
2封建邦,何振峰.基于主动学习的K-Hub聚类算法[J].计算机系统应用,2016,25(3):187-193.
3李金孟,林亚平,祝团飞.基于Hubness与类加权的k最近邻分类算法[J].计算机工程,2018,44(4):248-252. 被引量：6

二级引证文献11

1赖光源,李佳良.基于特征感知迭代的电网业务营销数据挖掘方法[J].计算机应用与软件,2017,34(8):76-80. 被引量：5
2周开河,徐孝忠,方云辉.可视化技术在电网信息通信专业展示中的应用[J].电子科技,2017,30(9):159-161. 被引量：15
3李新鹏,高欣,何杨,阎博,孙汉旭,李军良,徐建航,刘震宇,庞博.不平衡数据集下基于自适应加权Bagging-GBDT算法的磁盘故障预测模型[J].微电子学与计算机,2020,37(3):14-19. 被引量：8
4梁京章,黄星舒,吴丽娟,熊小萍.基于KPCA和改进K-means的电力负荷曲线聚类方法[J].华南理工大学学报（自然科学版）,2020,48(6):143-150. 被引量：23
5高峡,吴涛,高月仁.云计算环境下电子政务大数据系统填补及分类算法[J].电子设计工程,2020,28(23):73-79. 被引量：4
6吕刚,王雪,梅新奎.精准扶贫视角下高校家庭经济困难学生认定预测机制探究[J].高教学刊,2021(3):76-79. 被引量：2
7钟彩,潘梅森,彭春富,胡常乐.一种基于近邻分类识别算法的研究[J].电子元器件与信息技术,2020,4(10):44-45. 被引量：4
8梁淑蓉,陈基漓,谢晓兰.基于权重搜索树改进K近邻的高维分类算法[J].科学技术与工程,2021,21(7):2760-2766. 被引量：7
9曹渝昆,赵田.基于AT_CNN与Attention-BiGRU融合网络的电网故障报修信息的自动分类研究[J].计算机应用与软件,2021,38(5):93-98. 被引量：5
10刘诗语,吴鸣,李睿哲.基于多维缩放和KICIC的电力负荷聚类[J].科学技术与工程,2023,23(3):1096-1103. 被引量：1

13D Hubs发布8月份全球3D打印市场趋势报告[J].中国包装,2014,34(9):92-92.
2谢娟英,高瑞.Num-近邻方差优化的K-medoids聚类算法[J].计算机应用研究,2015,32(1):30-34. 被引量：11
3肖洋,李平,王鹏,邱宁佳.基于最小方差的自适应K-均值初始化方法[J].长春理工大学学报（自然科学版）,2015,38(5):140-144. 被引量：7
4柳伯超,秦茂玲,刘弘.一种基于分布的形状特征描述子[J].计算机技术与发展,2007,17(12):86-88.
5Spansion在中国苏州设立设计中心[J].电子测试（新电子）,2006(9):115-115.
6Li hua Yang Ching Y. Suen Tien D. Bui Ping Zhang 李宏（译）.基于恒定曲率特征的相似手写体数字识别[J].图象识别与自动化,2005(2):1-15.
7新联想全球运营网络[J].中国计算机用户,2005(18):17-17.
8向永谦,于福臣,史敦杰.防雷防过压保护系统的设计与应用[J].网管员世界,2010(9):54-56.
9Spansion成立苏州设计中心[J].电子设计应用,2006(9):144-144.
10豆瑞星.中国电信云数据中心的“新锐”力量[J].互联网周刊,2012(9):46-47.

计算机系统应用

2015年第4期

浏览历史

内容加载中请稍等...

基于Hub的高维数据初始聚类中心的选择策略被引量：3

参考文献11

二级参考文献14

共引文献1

同被引文献30

引证文献3

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

基于Hub的高维数据初始聚类中心的选择策略 被引量：3

参考文献11

二级参考文献14

共引文献1

同被引文献30

引证文献3

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

基于Hub的高维数据初始聚类中心的选择策略被引量：3