摘要
面对复杂信息环境下的数据预处理需求,提出了一种可以处理混合属性数据集的双重聚类方法。这种双重聚类方法由双重近邻无向图的构造算法或其改进算法,基于分离集合并的双重近邻图聚类算法、基于宽度优先搜索的双重近邻图聚类算法、或基于深度优先搜索的双重近邻图聚类算法来实现。通过人工数据集和UCI标准数据集的仿真实验,可以验证,尽管这三个聚类算法所采用的搜索策略不同,但最终的结果是一致的。仿真实验结果还表明,对于一些具有明显聚类分布结构且无近邻噪声干扰的数据集,该方法经常能取得比K-means算法和AP算法更好的聚类精度,从而说明这种双重聚类方法具有一定的有效性。为进一步推广并在实际中发掘出该方法的应用价值,最后给出了一点较有价值的研究展望。
In order to effectively preprocessing mixed data sets from complex information environment, this paper proposes a dual clustering method. This dual clustering method is implemented by a construction algorithm of a dual near neighbor undirected graph or its improved algorithm, a clustering algorithm based on merging disjointset, a clustering algorithm based on breadthfirstsearch, or a clustering algorithm based on depthfirstsearch. Through the simulation experiments of some artificial data sets and UCI standard data sets, we can verify that the three clustering algorithms can obtain the same results in the end, although they use different search strategies. The experimental results also show that this dual clustering method can often obtain better clustering quality than kmeans algorithm and AP algorithm when handling some data sets with apparent clusters and without near neighbors noises. This demonstrates the dual clustering method is comparatively effective and practical. In the end, some research expectations are given to disinter and popularize this method.
出处
《计算机工程与科学》
CSCD
北大核心
2013年第2期127-132,共6页
Computer Engineering & Science
基金
重庆三峡学院科学研究项目计划资助(11ZZ-058)