期刊文献+

自然最近邻密度聚类算法的改进方法

An Improved Method to Clustering Algorithm of Natural Nearest Neighbor Density
下载PDF
导出
摘要 自然最近邻算法(TNDP)所得结果的类内差异很小,若待分类集合元素较多,则可能出现过度分割问题——具有相似特征元素的子集被分割为多个小类。为此,在使用TNDP算法后,若类内平方和差分小于给定的阀值,就把距离最近的两个子类合并为一个大类,如此反复,直到类间的距离足够大为止。这可保证“具有相似特征元素的多个子类”合并为一个较大类,从而提高最终分类结果的可解释性。 Since the results of the natural nearest neighbor algorithm(TNDP)have small intra-class differences,if there are many elements in the set to be classified,there may be an over-segmentation problem,a subset of elements with similar characteristics is divided into multiple small classes.This article suggests that after using the TNDP algorithm,if the difference of the sum of squares within the classes is less than a given threshold,the closest two subclasses can be combined into a new big class.Do this until the distance between each class is large enough.This can often ensure that"multiple classes with similar feature elements"are merged into one larger category,thereby improving the interpretability of the final classification result.
作者 李俊海 LI Junhai(College of Science,Henan University of Technology,Zhengzhou 450007,China)
出处 《新乡学院学报》 2020年第12期38-42,共5页 Journal of Xinxiang University
基金 河南省高等学校重点科研项目(20B416001)。
关键词 自然最近邻居 密度聚类算法 类间相似度 natural nearest neighbor clustering algorithm based on density similarity between clusters
  • 相关文献

参考文献2

二级参考文献8

  • 1Han J W, Kambr M. Data mining concepts and techniques[M]. Beijing: Higher Education Press, 2001. 145~176.[2]Kaufan L, Rousseeuw P J. Finding groups in data: an introduction to cluster analysis[M]. New York: John Wiley & Sons, 1990.
  • 2Guha S, Rastogi R, Shim K. CURE: an efficient clustering algorithm for large databases[A]. Haas L M, Tiwary A. Proceedings of the ACM SIGMOD International Conference on Management of Data[C]. Seattle: ACM Press, 1998. 73~84.
  • 3Ester M, Kriegel H P, Sander J, et al. A density based algorithm for discovering clusters in large spatial databases with noise[A]. Simoudis E, Han J W, Fayyad U M. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining[C].
  • 4Agrawal R, Gehrke J, Gunopolos D, et al. Automatic subspace clustering of high dimensional data for data mining application[A]. Haas L M, Tiwary A. Proceedings of the ACM SIGMOD International Conference on Management of Data[C]. Seattle: ACM Press, 1998.
  • 5Zhang T,Ramakrishnan R,Livny M. BIRCH:an efficient data clustering method for very large database[R].Computer Sciences Dept,Univ of Wisconsin-Madison,1995.
  • 6Zhang T,Ramakrishnan R,Livny M. BIRCH:an efficient data clustering method for very large databases[A]. Jagadish H V, Mumick I S. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data[C]. Quebec: ACM Press, 1996.103~114.
  • 7Beyer K S,Goldstein J,Ramakrishnan R,et al. When is 'nearest neighbor' meaningful?[A].Beeri C,Buneman P.Proceedings of the 7th International Conference on Data Theory[C].ICDT'99. LNCS1540,Jerusalem, Israel: Springer, 1999.217~235.
  • 8Karypis G,Han E H,Kumar V. CHAMELEON: a hierarchical clustering algorithm using dynamic modeling[J].IEEE Computer,1999,32(8):68-75.

共引文献44

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部