摘要
类内误差平方和最小化的聚类准则求解是NP难问题,K-Means采用的迭代重定位方法本质上是一种局部搜索的爬山算法,因此聚类结果对初始代表点的选择非常敏感,只能保证局部最优。为此,引入元启发式策略,通过建立评估函数对K-Means初始代表点和目标函数之间的依赖关系进行近似,然后利用近似评估函数指导新的初始代表点的选择,构成一种迭代自学习框架下的K-Means算法。实验表明算法可以很好地克服K-Means对初始代表点的依赖性,获得较高质量的聚类结果。
The clustering problems based on minimizing the sum of intra-cluster squared-error are known to be NP- hard. The iterative re-locating method using by K-Means is essentially a kind of local hill-climbing algorithm, which will find a locally minimal solution eventually and cause much sensitivity to initial representatives. The meta-heuristic strategy was introduced to minimize the squared-error criterion globally. Firstly, an evaluation function was built to approxi- mate the dependency between a series of initial representatives of K-Means and the local minimal of objective criterion, and then the selection of initial representatives was done under the supervision of the evaluation function for the next K- Means. This iterative and self-learning process is called Meta-KMeans algorithm. The experimental demonstrations show that Meta-KMeans can overcome the sensitivity to initial representatives of K-Means to a great extent.
出处
《计算机科学》
CSCD
北大核心
2009年第7期175-178,共4页
Computer Science
基金
中国矿业大学科技基金(OD080313)
国家863高技术研究发展计划(2006AA12Z217)资助