摘要
相异度和相似度度量是聚类算法中非常重要的一种因素,往往会影响到聚类分析的结果。很多聚类算法采用欧式距离作为计算数据相似度的度量。而欧式距离不能反映属性值的全局特性,且不顾及各属性之间的量纲差异,因此当不同属性间具有明显量纲或值域差异时,不能取得很好的效果。对此,提出了一种广义加权Minkowski距离,即由各属性的量纲和值域信息来确定各属性的广义权值,既考虑了整个数据集的特性,又消除了各属性之间的不和谐,同时分位数的引进在一定程度上减弱了噪声属性值对距离度量的影响。将提出的新的距离度量用于经典的k-means算法和量子遗传聚类算法,实验结果表明,采用新的距离度量和引进量子遗传算法的聚类是更加有效的。
Difference and similarity are very important factor in clustering algorithms, and always affect the results of clustering analysis. A lot of clustering algorithms use Euclidean distance as it' s similarity measure. Euclidean distance can't reflect the global information of attributes, and don't consider the unit differences between each attribute, so it can' t make a good result when there is obvious unit and domain differences. So, this paper put forward a generally weighted Minkowski distance which is determined by the unit and domain information of each attributes value. Not only charac- teristics of whole data are considered, but also dicord between attributes is removed, at the same time, using of fractional bits weakens the noise data influence. We used new distance measure in classic k-means. And quantum genetic k-means and the experimental result show that the new algorithm is effective.
出处
《计算机科学》
CSCD
北大核心
2013年第5期224-228,共5页
Computer Science
基金
国家水体污染控制与治理科技重大专项(2009ZX07318-003-01-02)
水利部公益性行业科研专项(201001031)资助