期刊文献+

基于密度差分的自动聚类算法 被引量:16

Automatic Clustering Algorithm Based on Density Difference
下载PDF
导出
摘要 聚类作为无监督学习技术,已在实际中得到了广泛的应用.但是对于带有噪声的数据集,一些主流算法仍然存在着噪声去除不彻底和聚类结果不准确等问题.提出了一种基于密度差分的自动聚类算法(clustering based on density difference,简称CDD),实现了对含有噪声数据集的自动分类.所提算法根据噪声数据和有用数据密度的不同,实现了去噪声和数据的分类,并通过构建数据间的邻域,进一步实现了对有用数据间不同类别的划分.通过实验验证了所提算法的有效性. As an unsupervised learning technology,clustering has been widely used in practice.However,some mainstream algorithms still have problems such as incomplete noise removal and inaccurate clustering results for the datasets with noise.In this paper,an automatic clustering algorithm based on density difference(CDD)is proposed to realize automatic classification of the datasets containing the noise.The algorithm is based on the density difference between noise data and useful data to achieve removing noise and data classification.Moreover,the useful data are classified into different classes through the neighborhood construction procedure.Experimental results demonstrate that the CDD algorithm has high performance.
作者 陈朝威 常冬霞 CHEN Zhao-Wei;CHANG Dong-Xia(School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China;Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China)
出处 《软件学报》 EI CSCD 北大核心 2018年第4期935-944,共10页 Journal of Software
基金 国家自然科学基金(61532005)~~
关键词 聚类 数据挖掘 离散点检测 差分 CDD clustering data mining outliers detection difference CDD
  • 相关文献

参考文献2

二级参考文献27

  • 1李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量:113
  • 2陈宗海,文锋,聂建斌,吴晓曙.基于节点生长k-均值聚类算法的强化学习方法[J].计算机研究与发展,2006,43(4):661-666. 被引量:13
  • 3Han Jiawei,Kamber M.Data Mining Concepts and Techniques[M].San Francisco:Morgan Kaufmann,2001.
  • 4Brendan J F,Delbert D.Clustering by passing messages between data points[J].Science,2007,315(16):972-976.
  • 5Zhang Jiangshe,Liang Yiuwing.Improved possibilistic c-means clustering algorithms[J].IEEE Trans on Fuzzy Systems,2004,12(2):209-217.
  • 6Mac Q J.Some methods for classification and analysis of multivariate observation[C]//Proc of the 5th Berkley Symp on Mathematical Statistics and Probability.Berkley,California:University of California Press,1967:281-297.
  • 7Huang Zhexue.Clustering large data sets with mixed numeric and categorical values[C]//Proc of PAKDD97.Singapore:World Scientific,1997:21-35.
  • 8Huang Zhexue.Extensions to the K-means algorithm for clustering large data sets with categorical values[J].Data Mining and Knowledge Discovery,1998,2(3):283-304.
  • 9Ng M K,Li Junjie,Huang Zhexue,et al.On the impact of dissimilarity measure in K-modes clustering algorithm[J].IEEE Trans on Pattern Analysis and Machine Intelligence,2007,29(3):503-507.
  • 10San O M,Huynh V N,Nakamori Y.An alternative extension of the K-means algorithm for clustering categorical data[J].Int Journal Application Mathematic and Computer Science,2004,14(2):241-247.

共引文献1097

同被引文献108

引证文献16

二级引证文献49

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部