期刊文献+

基于Hub的高维数据初始聚类中心的选择策略 被引量:3

Hub-Based Initialization for K-hubs
下载PDF
导出
摘要 针对基于Hub的聚类算法K-hubs算法存在对初始聚类中心敏感的问题,提出一种基于Hub的初始中心选择策略.该策略充分利用高维数据普遍存在的Hubness现象,选择相距最远的K个Hub点作为初始的聚类中心.实验表明采用该策略的K-hubs算法与原来采用随机初始中心的K-hubs算法相比,前者拥有较好的初始中心分布,能够提高聚类准确率,而且初始中心所在的位置倾向于接近最终簇中心,有利于加快算法收敛. K-hubs is a Hub-based clustering algorithm that is very sensitive to initialization. Therefore, this paper proposes an initialization method based on Hub to solve this problem. The initialization method takes full use of the feature of the Hubness phenomenon by selecting initial centers that are the most remote Hub points with each other. The experimental results show that compared with the random initialization of ordinary K-hubs algorithm, the proposed initialization method can obtain a better distribution of initial centers, which could enhance the clustering accuracy; moreover, the selected initial centers cart appear near the cluster centers, which could speed up the convergence of the clustering algorithm.
出处 《计算机系统应用》 2015年第4期171-175,共5页 Computer Systems & Applications
关键词 Hubness 初始中心 最大最小距离方法 高维数据 聚类 Hubness initial center maximm method high-dimensional data clustering
  • 相关文献

参考文献11

  • 1Han J, Kamber M. Data Mining: Concepts and Techniques. 2nd ed, Morgan Kaufmann Publishers, 2006.
  • 2Radovanovic M, Nanopoulos A, Ivanovic M. Nearest neighbors in high-dmensional data: The emergence and influence of hubs. Proc. of the 26th International Conference on Machine Learning(ICML). 2009. 865-872.
  • 3Radovanovic M, Nanopoulos A, Ivanovic M. Hubs in space: popular nearestneighbors in high-dimensional data. Journal of Machine Learning Research 9999, 2010: 2487-2531.
  • 4Tomasev N, Mladenic D. Nearest neighbor voting in high dimensional data: Learning form past occurrences. Conputer Science and Information Systems, 9,2012: 691-712.
  • 5Tomasev N, Mladenic D. Hubness-aware shared neighbor distances for high-dimensional k-nearest neighbor classification. Hybrid Aritificial Intelligent Systems, 2012: 116-127.
  • 6翟婷婷,何振峰.基于Hubness的类别均衡的时间序列实例选择算法[J].计算机应用,2012,32(11):3034-3037. 被引量:2
  • 7Zhai TT, He ZF. Instance selection for time series classification based on immune binary particle swarm optimization. Knowledge-Based Systems. 2013, 49(9): 106-115.
  • 8Tomasev N, Radovanovic M, Mladenic D, Ivanovic M. The role of hubness in clustering high-dimensional data. Advances in Knowledge Discovery and Data Mining, 2011: 183-195.
  • 9Aggarwal CC, Hinneburg A, Kein DA. On the surprising behavior of distance metrics in high dimensional spaces. Proc. of the 8th International Conference on Database Theory (ICDT). 2001. 420-434.
  • 10Beyer K, Goldstein J, Ramakrishnan R, et al. When is "nearest neighbor" meaningful? Proc. of the 7th International Conference on Database Theory(ICDT). 1999. 217-235.

二级参考文献14

  • 1杨一鸣,潘嵘,潘嘉林,杨强,李磊.时间序列分类问题的算法比较[J].计算机学报,2007,30(8):1259-1266. 被引量:40
  • 2DING H, TRAJCEVSKI G, SCHEUERMANN P, et al. Querying and mining of time series data: Experimental comparison of representations and distance measures[ J]. Proceedings of the VLDB Endowment, 2008, 1(2) : 1542 - 1552.
  • 3CZARNOWSKI I. Cluster-based instance selection for machine classification[J]. Knowledge and Information Systems, 2012, 30(1): 113 -133.
  • 4GARCIA S, DERRAC J, CANO J R, et al. Prototype selection for nearest neighbor classification: taxonomy and empirical study[ J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(3): 417-435.
  • 5XI X, KEOGH E, SHELTON C, et al. Fast time series classification using numerosity reduction[ C] // Proceedings of the 23rd International Conference on Machine learning. New York: ACM, 2006: 1033 - 1040.
  • 6BUZA K, NANOPOULOS A, SCHMIDT-THIEME L. INSIGHT: Efficient and effective instance selection for time-series classification [ C]// Proceedings of 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, LNCS 6635. Berlin: Springer, 2011: 149 - 160.
  • 7RADOVANOVIC M, NANOPOULOS A, IVANOVIC M. Time series classification in many intrinsic dimensions[ C]//Proceedings of SIAM International Conference on Data Mining. New York: ACM, 2010:677 - 688.
  • 8RADOVANOVIC M, NANOPOULOS A, IVANOVIC M. Nearest neighbors in high-dimensional data: The emergence and influence of hubs[ C]///Proceedings of the 26th Annual International Conference on Machine Learning. New York: ACM, 2009:865 -872.
  • 9RADOVANOVIC M, NANOPOULOS A, IVANOVIC M. Hubs in space: popular nearest neighbors in high-dimensional data[ J]. The Journal of Machine Learning Research, 2010, 11(3) : 2487 - 2531.
  • 10SUN YANMIN, KAMEL M S, WONG A K C, et al. Cost-sensitive boosting for classification of imbalanced data[ J]. Pattern Recognition, 2007, 40(12) : 3358 -3378.

共引文献1

同被引文献30

  • 1汪仁红,王家伟,梁宗保.基于投影和密度的高维数据流聚类算法[J].重庆交通大学学报(自然科学版),2013,32(4):725-728. 被引量:1
  • 2何振峰,熊范纶.结合限制的分隔模型及K-Means算法[J].软件学报,2005,16(5):799-809. 被引量:23
  • 3中国电机工程学会电力信息化专委会.中国电力大数据发展白皮书[R].北京:中国电机工程学会电力信息化专委会,2013.
  • 4YAOYP.聚类分析中几种算法的比较[EB/OL].(2011-03-27)[2015-08-08].http://blog.csdn.net/yaoyepeng/article/details/6281991,2015-08-08.
  • 5Johnho.聚类算法总结[EB/OL].(2013-06-06)[2015-08-08].http://blog.chinaunix.net/uid-10289334-id-3758310.html.2015-08-08.
  • 6DONOHO D L. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality [Z]. Los Angeles : Aide-Memoire of the lecture in AMS conference of the 21st Century, 2000.
  • 7VERLEYSEN M. Learning High-dimensional Data[Z]. Siena: Limi- tations and Future Trends in Neural Computation, 2003: 141-162.
  • 8Donoho DL. High-dimensional data analysis:The curses and blessings of dimensionality. AMS Math Challenges Lecture, 2000: 1-32.
  • 9Radovanovic M, Nanopoulos A, Ivanovic M. Nearest neighbour in high-dimensional data: The emergence and influence of hubs. Proc. of 26th Annual International Conference on Machine Learning(ICML), 2009:865-872.
  • 10Radovanovic M, Nanopoulos A, Ivanovic M. Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 2010, 11:2487-2531.

引证文献3

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部