摘要
k-means聚类算法是解决聚类问题的一个常用方法.近年来,国外许多学者对该问题的近似常数算法和(1+ε)近似算法进行了研究.利用Kumar等人随机取样技术对于基于最小聚类划分k-means提出一个(1+ε)随机近似算法.该算法利用随机取样技术从集合中求出部分取样点,再对随机取样点进行组合找出每个聚类的部分点,将该部分点的质心点作为相应子聚类簇的质心点.通过多次运行该算法可以以较高概率求出k-means聚类的1+ε近似值.
k-means clustering is one of the most popular approaches used in clustering problem. In recent years, many researches have been conducted to find algorithms with bounded quality, either (1+ε) approximation or constant approximation. In this paper, the (1+ε) randomized approximate algorithm is presented for the k-means clustering based on the minimum size of the smallest optimal sub cluster. The main idea of this algorithm is to use the sampling technique proposed by Kumar et al. First, some points are sampled from the input point set. Then some sampled points are combined to calculate their centroid and the new centroid is used as one of the sub cluster center point. If the algorithm is run several times, the result of the (1+ε) approximation can be obtained with high probability proved.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2008年第z1期26-30,共5页
Journal of Computer Research and Development
基金
国家自然科学基金项目(60573024)