摘要
能量距离源于牛顿重力势能,作为一种统计观测距离函数,近年来得到了越来越广泛的应用,文章将其用于聚类算法研究。能量距离函数定义为组间组内对象的指数距离之差,将传统的Ward最小离差平方和法(指数为2)进行了推广。组间组内距离统计量决定了聚类算法的超度量性和空间扩张性,指数小于2的情形还具有统计一致性。推广后的Ward聚类算法能够区分具有几乎相同重心的类,是该方法相比传统聚类算法的优势所在,最后通过实验验证上述结论。
Energy distance deriving from Newton's gravitational potential energy and researching on functions of distances between statistical observations has been used more and more extensively in recent years. This paper applies it to the clustering method research. This method extends the traditional Ward' s minimum sum of squares of deviations (exponent as2) by defining energy distance function as a difference of the exponent distance of between-within clusters. The between-within distance statistic determines uhrametric and space-dilating of the clustering method; and for exponent strictly less than 2, there exists a statistical consistency; the proposed Ward' s extension is able to differentiate clusters with nearly equal centers, which is an important ad- vantage over the former traditional method. Finally the paper verifies the above conclusions through experiment.
出处
《统计与决策》
CSSCI
北大核心
2017年第22期21-25,共5页
Statistics & Decision
关键词
能量距离
组间组内距离
聚类算法
energy distance
between-within cluster distances
clustering method