聚类作为无监督学习技术,已在实际中得到了广泛的应用.但是对于带有噪声的数据集,一些主流算法仍然存在着噪声去除不彻底和聚类结果不准确等问题.提出了一种基于密度差分的自动聚类算法(clustering based on density difference,简称CD...聚类作为无监督学习技术,已在实际中得到了广泛的应用.但是对于带有噪声的数据集,一些主流算法仍然存在着噪声去除不彻底和聚类结果不准确等问题.提出了一种基于密度差分的自动聚类算法(clustering based on density difference,简称CDD),实现了对含有噪声数据集的自动分类.所提算法根据噪声数据和有用数据密度的不同,实现了去噪声和数据的分类,并通过构建数据间的邻域,进一步实现了对有用数据间不同类别的划分.通过实验验证了所提算法的有效性.展开更多
A genetic clustering algorithm was developed based on dynamic niching with data attraction. The algorithm uses the concept of Coulomb attraction to model the attraction between data points. Then, the niches with data ...A genetic clustering algorithm was developed based on dynamic niching with data attraction. The algorithm uses the concept of Coulomb attraction to model the attraction between data points. Then, the niches with data attraction are dynamically identified in each generation to automatically evolve the optimal number of clusters as well as the cluster centers of the data set without using cluster validity functions or a variance-covariance matrix. Therefore, this clustering scheme does not need to pre-specify the number of clusters as in existing methods. Several data sets with widely varying characteristics are used to demonstrate the superiority of this algorithm. Experimental results show that the performance of this clustering algorithm is high, effective, and flexible.展开更多
文摘聚类作为无监督学习技术,已在实际中得到了广泛的应用.但是对于带有噪声的数据集,一些主流算法仍然存在着噪声去除不彻底和聚类结果不准确等问题.提出了一种基于密度差分的自动聚类算法(clustering based on density difference,简称CDD),实现了对含有噪声数据集的自动分类.所提算法根据噪声数据和有用数据密度的不同,实现了去噪声和数据的分类,并通过构建数据间的邻域,进一步实现了对有用数据间不同类别的划分.通过实验验证了所提算法的有效性.
基金Supported by the Fund of the Key Scientific and Technical Innovation Project,Ministry of Education of China (No.706004)
文摘A genetic clustering algorithm was developed based on dynamic niching with data attraction. The algorithm uses the concept of Coulomb attraction to model the attraction between data points. Then, the niches with data attraction are dynamically identified in each generation to automatically evolve the optimal number of clusters as well as the cluster centers of the data set without using cluster validity functions or a variance-covariance matrix. Therefore, this clustering scheme does not need to pre-specify the number of clusters as in existing methods. Several data sets with widely varying characteristics are used to demonstrate the superiority of this algorithm. Experimental results show that the performance of this clustering algorithm is high, effective, and flexible.