基于最小聚类划分的K-means聚类(1+ε)近似算法被引量：5

The (1+ε) Approximate Algorithm for K-means Based on the Minimum Size of Sub-Cluster

下载PDF

导出

摘要 k-means聚类算法是解决聚类问题的一个常用方法.近年来,国外许多学者对该问题的近似常数算法和(1+ε)近似算法进行了研究.利用Kumar等人随机取样技术对于基于最小聚类划分k-means提出一个(1+ε)随机近似算法.该算法利用随机取样技术从集合中求出部分取样点,再对随机取样点进行组合找出每个聚类的部分点,将该部分点的质心点作为相应子聚类簇的质心点.通过多次运行该算法可以以较高概率求出k-means聚类的1+ε近似值. k-means clustering is one of the most popular approaches used in clustering problem. In recent years, many researches have been conducted to find algorithms with bounded quality, either (1+ε) approximation or constant approximation. In this paper, the (1+ε) randomized approximate algorithm is presented for the k-means clustering based on the minimum size of the smallest optimal sub cluster. The main idea of this algorithm is to use the sampling technique proposed by Kumar et al. First, some points are sampled from the input point set. Then some sampled points are combined to calculate their centroid and the new centroid is used as one of the sub cluster center point. If the algorithm is run several times, the result of the (1+ε) approximation can be obtained with high probability proved.

作者王守强朱大铭史士英

机构地区山东大学计算机科学与技术学院山东交通学院信息工程系

出处《计算机研究与发展》 EI CSCD 北大核心 2008年第z1期26-30,共5页 Journal of Computer Research and Development

基金国家自然科学基金项目(60573024)

关键词 K-MEANS 聚类质心点 ε质心点 k-means cluster centroid ε-centroid

分类号 TP301 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献9

1[1]M Inaba,N Kaoth,H Imai.Application of weighted Voronoi diagrams and randomization to variance-based k-clustering(extended abstract).In:Proc of the 10th Annual Symp on Computational Geometry.New York:ACM Press,1994.332-339
2[2]V Arya,et al.Local search heurictics for k-median and facility location problems.STOC'2001,Hersonissons,Crete,Greece,2001
3潘锐,朱大铭,马绍汉,肖进杰.k-Median近似计算复杂度与局部搜索近似算法分析[J].软件学报,2005,16(3):392-399. 被引量：8
4[4]T Kanungo,D M Mount,N Netanyahu,et al.A local search approximation algorithm for k-means clustering.Computational Geometry,2004,28(2-3):89-112
5[5]J Matousek.On approximate geometric k-clustering.Discrete and Computational Geometry,2000,24(1):61-84
6[6]W F de la Vega,M Karpinski,C Kenyon,et al.Approximation schemes for clustering problems.In:Proc of the 35th Annual ACM Symp on Theory of Computing.New York:ACM Press,2003.50-58
7[7]S Har-Peled,S Mazumdar.Coresets for k-means and k-median clustering and their applications.In:Proc of the 36th Annual ACM Symp on Theory Computer.New York:ACM Press,2004.291-300
8[8]A Kumar,Y Sabharwal,S Sen.A sample linear time (1+ε) algorithm for k-means clustering in any dimensions.In:Proc of the 45th FOCS.Piscataway,NJ:IEEE Press,2004.454-462
9[9]Vaidya.An O(nlogn) algorithm for the all-nearest-neighbors problem.Discrete Computer Geom,1989,4(2):101-115

二级参考文献11

1Arora S, Raghavan P, Rao S. Approximation schemes for euclidean k-Medians and related problems. In: Jeffrey V, ed. Proc. of the 30th Annual ACM Symp. on Theory of Computing. New York: ACM Press, 1998. 106-113.
2Badoiu M, Har-Peled S, Indyk P. Approximate clustering via core-sets. In: John R, ed. Proc. of the 34th Annual ACM Symp. on Theory of Computing. New York: ACM Press, 2002. 250-257.
3Lin JH, Vitter JS. Approximation algorithms for geometric Median problems. Information Processing Letters, 1992,44(5):245-249.
4Charikar M, Guha S, Tardos E, Shmoys D. A constant-factor approximation algorithm for the k-Median problem (Extended Abstract). In: Jeffrey V, ed. Proc. of the 31th Annual ACM Symp. on Theory of Computing. New York: ACM Press, 1999. 1-10.
5Jain K, Vazirani V. Primal-Dual approximation algorithms for metric facility location and k-Median problems. In: Alok A, ed. Proc. of the 40th Annual Symp. on Foundations of Computer Science. Washington: IEEE Computer Society, 1999. 2-13.
6Charikar M, Guha S. Improved combinatorial algorithms for the facility location and k-Median problems. In: Alok A, ed. Proc. of the 40th Annual Symp. on Foundations of Computer Science. Washington: IEEE Computer Society, 1999. 378-388.
7Arya V, Garg N, Khandekar R, Meyerson A, Munagala K, Pandit V. Local search heuristics for k-Median and facility location problems. In: Jeffrey V, ed. Proc. of the 33rd Annual ACM Symp. on Theory of Computing. New York: ACM Press, 2001. 21-29.
8Lin JH, Vitter JS. ε-Approximations with minimum packing constraint violation. In: Rao K, ed. Proc. of the 24th Annual ACM Symp. on Theory of Computing. New York: ACM Press, 1992. 771-782.
9Guha S, Khuller S. Greedy strikes back: Improved facility location algorithms. In: Howard K, ed. Proc. of the 9th Annual ACM-SIAM Symp. on Discrete Algorithms. Philadelphia: Society for Industrial and Applied Mathematics, 1998. 649-657.
10Feige U. A threshold of lnn for approximating set-cover. Journal of the ACM, 1998,45(4):634-652.

共引文献7

1王守强,朱大铭,韩爱丽.基于初始点选取的k-means聚类近似常数算法[J].计算机研究与发展,2007,44(z2):69-74. 被引量：3
2潘锐,朱大铭,马绍汉.一般设施定位问题计算复杂度和近似算法研究[J].计算机研究与发展,2007,44(5):790-797. 被引量：4
3潘锐,朱大铭,董林光,董颖.求解k中间点问题的新局部搜索算法[J].计算机工程与应用,2008,44(4):36-38.
4李委霖,张鹏,朱大铭.On Constrained Facility Location Problems[J].Journal of Computer Science & Technology,2008,23(5):740-748.
5王守强,朱大铭,史士英.k-means聚类问题的改进近似算法[J].山东大学学报（工学版）,2011,41(4):125-132. 被引量：1
6王守强.k-median问题反向贪心随机算法[J].计算机科学,2012,39(7):232-236. 被引量：2
7樊自甫,姚杰,杨先辉.基于时延优化的软件定义网络控制层部署策略[J].计算机应用,2018,38(1):207-211. 被引量：3

同被引文献63

1李美娟,陈国宏,陈衍泰.综合评价中指标标准化方法研究[J].中国管理科学,2004,12(z1):45-48. 被引量：185
2沈晓萍,卢晓黎,闫志农.工艺方法对马铃薯全粉品质的影响[J].食品科学,2004,25(10):108-112. 被引量：53
3潘锐,朱大铭,马绍汉,肖进杰.k-Median近似计算复杂度与局部搜索近似算法分析[J].软件学报,2005,16(3):392-399. 被引量：8
4赵凤敏,杨延辰,王远,王威,刘期成.真空油炸马铃薯片加工工艺的研究[J].农产品加工（下）,2005(12):33-34. 被引量：13
5张文霖.主成分分析在SPSS中的操作应用[J].市场研究,2005(12):31-34. 被引量：313
6张华江,迟玉杰,王辉.油炸薯片的工艺参数对其产品品质影响的研究[J].食品工业科技,2007,28(2):96-98. 被引量：11
7彭鑑君,吴刚,杨延辰,徐考群,赵凤敏.马铃薯颗粒全粉与雪花全粉的生产应用[J].粮油食品科技,2007,15(4):12-13. 被引量：17
8GB/T5009.9-2008.食品中淀粉的测定[S].
9GB/T5009.7-2008,食品中还原糖的测定[S].
10PENA J M, LOZANO J A, ARRANAGA P L. An empirical comparison of four initialization methods for the kmeans algorithm [J ]. Pattern Recognition Lett, 1999 (20) : 1027-1040.

引证文献5

1王秀芳,王岩.优化K均值随机初始中点的改进算法[J].化工自动化及仪表,2012,39(10):1302-1304. 被引量：4
2王守强,朱大铭,史士英.k-means聚类问题的改进近似算法[J].山东大学学报（工学版）,2011,41(4):125-132. 被引量：1
3张小燕,赵凤敏,兴丽,刘威,杨延辰,杨炳南.不同马铃薯品种用于加工油炸薯片的适宜性[J].农业工程学报,2013,29(8):276-283. 被引量：32
4杨炳南,张小燕,赵凤敏,杨延辰,刘威,李树君.不同马铃薯品种的不同加工产品适宜性评价[J].农业工程学报,2015,31(20):301-308. 被引量：66
5张忆洁,祁岩龙,宋鱼,沈洪飞,冯怀章.不同马铃薯品种用于加工面条的适宜性[J].现代食品科技,2020,36(2):85-93. 被引量：7

二级引证文献106

1魏进堂,李旭华,邹金秋.甘肃定西马铃薯及其脱毒种薯产业发展现状、存在问题与思路建议[J].中国农业资源与区划,2021,42(6):16-21. 被引量：18
2李瑜,范莹莹,李家寅,张笑笑.马铃薯泥面条水分形态迁移特性研究[J].粮食加工,2020,0(1):13-16.
3杨宏宇,常媛.基于K均值多重主成分分析的App-DDoS检测方法[J].通信学报,2014,35(5):16-24. 被引量：13
4苏彦苹,赵爽,李惠,赵丹,王宝庆,齐国辉.26份不同基因型新疆核桃脂肪酸变异及关联分析[J].中国食品学报,2018,18(12):261-269. 被引量：8
5张彪,刘璇,毕金峰,吴昕烨,金鑫,李旋,李潇.基于BP人工神经网络算法的苹果制干适宜性评价[J].中国农业科学,2019,52(1):129-142. 被引量：18
6赵凤敏,李树君,张小燕,杨炳南,刘威,苏丹,杨延辰.不同品种马铃薯的氨基酸营养价值评价[J].中国粮油学报,2014,29(9):13-18. 被引量：59
7高选幸,孙卫红,周忠凯,童晓.校车的线路选择及优化设计方法[J].计算机工程与应用,2015,51(12):246-249. 被引量：1
8陶栋琦,薄翠梅,易辉.基于多时段MPCA的半导体蚀刻过程监测方法[J].传感技术学报,2015,28(6):798-802. 被引量：3
9杨炳南,张小燕,赵凤敏,杨延辰,刘威,李树君.不同马铃薯品种的不同加工产品适宜性评价[J].农业工程学报,2015,31(20):301-308. 被引量：66
10赵凤敏,张小燕,曹有福,李少萍,兴丽,杨炳南.栖霞苹果真空冷冻干燥工艺的响应面分析[J].中国农业大学学报,2015,20(5):241-248. 被引量：4

1王守强,朱大铭.基于k-means的(1+ε)近似算法求解[J].现代电子技术,2006,29(19):154-156.
2王东风,黄金山.基于二次优化的随机优化算法结果的改进[J].控制与决策,2015,30(2):380-384. 被引量：2
3邓瑞娟.Myeclipse和MySQL下的高校人事管理信息系统设计[J].西昌学院学报（自然科学版）,2013,27(2):50-52. 被引量：3
4廖海林,宗群.多无人机仿真平台的设计与实现[J].计算机仿真,2012,29(4):88-91. 被引量：3
5范瑞娟,黄斌,刘新友.基于多核CPU的并行程序在指控系统中的应用[J].微型电脑应用,2008,24(12):48-49. 被引量：3
6张树粹,华臻,张玉林.随机取样技术中的数据处理[J].农机化研究,2003,25(4):214-215. 被引量：1
7数据中心如何驱动国际业务增长？[J].信息方略,2014,0(10):29-29.
8王哲.视觉跟踪用眼睛使用的“鼠标”[J].电脑爱好者,2007(7):16-16.
9徐华.并行调试环境分析与设计[J].科技情报开发与经济,2005,15(19):251-253.
10张晓琳,李辉.基于J2EE的高校固定资产管理系统设计与实现[J].计算机技术与发展,2012,22(8):177-180. 被引量：9

计算机研究与发展

2008年第z1期

浏览历史

内容加载中请稍等...

基于最小聚类划分的K-means聚类(1+ε)近似算法被引量：5

参考文献9

二级参考文献11

共引文献7

同被引文献63

引证文献5

二级引证文献106

相关作者

相关机构

相关主题

浏览历史

基于最小聚类划分的K-means聚类(1+ε)近似算法 被引量：5

参考文献9

二级参考文献11

共引文献7

同被引文献63

引证文献5

二级引证文献106

相关作者

相关机构

相关主题

浏览历史

基于最小聚类划分的K-means聚类(1+ε)近似算法被引量：5