摘要
分析研究聚类分析方法,对多种聚类分析算法进行分析比较,讨论各自的优点和不足,同时针对原k-means算法的聚类结果受随机选取初始聚类中心的影响较大的缺点,提出一种改进算法。通过将对数据集的多次采样,选取最终较优的初始聚类中心,使得改进后的算法受初始聚类中心选择的影响度大大降低;同时,在选取初始聚类中心后,对初值进行数据标准化处理,使聚类效果进一步提高。通过UCI数据集上的数据对新算法Hk-means进行检测,结果显示Hk-means算法比原始的k-means算法在聚类效果上有显著的提高,并对相关领域有借鉴意义。
Analyze and research the method of cluster analysis,analyze and compare many kinds of algorithms of cluster analysis,discuss their respective strengths and weaknesses.At the same time,according to the weaknesses of the cluster result of original k-means algorithm is significant influence by selecting the initial cluster centers randomly,a modified algorithm is proposed.Through taking sample many times to data set,choose final superior cluster center,bring down the impact of initial cluster centers to improved algorithm greatly.Simultaneously,the initial data is standadized once the initial cluster center is selected,makes cluster effect improved furthermore.Detecting new algorithm Hk-means through the date of UCI data set,the result shows that Hk-means algorithm is more prominent improved than initial k-means algorithm in cluster effect,and it's useful for conference to relative field.
出处
《计算机技术与发展》
2011年第7期54-57,62,共5页
Computer Technology and Development
基金
哈尔滨市后备带头人基金项目(2004AFXXJ039)