摘要
提出一种基于混合网格划分的子空间高维数据聚类算法。该算法消除了各个属性分量数值范围大小对计算的影响;有效去除冗余属性以提高聚类准确性与降低时间复杂度。根据数据分布情况灵活选择固定网格划分或是自适应网格划分,利用这二种不同的网格划分方法具有的优点,以实现进一步降低算法的时间复杂度和提高聚类结果的准确性,并使算法具有更优的可伸缩性。实验使用仿真数据表明,该算法在处理具有属性值域范围大的高维大规模数据时是实用有效的。
A subspace clustering algorithm of high dimension data set based on hybrid-grid partitioning is proposed.The impact of attribute values range to the calculation is eliminated,filtering out redundant attributes is effective to enhance the clustering accuracy and reduce time complexity.The flexibility to choose a fixed or adaptive grid partition using the advantage of them to improve time complexity and the accuracy of clustering according to the data distribution.The algorithm has better scalability,too.A set of experiments on a synthetic dataset demonstrate the effectiveness and efficiency of the algorithms when clustering on high dimensional and large-scale data with the big range of the attribute value.
出处
《计算机技术与发展》
2010年第10期150-153,共4页
Computer Technology and Development
关键词
高维聚类
子空间聚类
相对熵
网格划分
high dimensional clustering
subspace clustering
relative entropy
grid partition