摘要
为了改善单一聚类算法的聚类性能,提出一种基于量子遗传算法的XML文档聚类集成解决方法。该方法首先利用KNN分类算法将XML文档划分成k个差异性的聚类成员;其次根据聚类成员的关系获得内联相似度矩阵,并通过多次分割、向下、向上、双向收缩的QR算法分解特征值对应的特征向量来实现矩阵的维数缩减;然后在映射空间上用量子遗传算法实现聚类集成,把每一个样本判别到最优的聚类类别中。这样减少了数据差异性对聚类结果的影响,提高了聚类质量。实验结果表明,在真实的数据集上,该聚类集成算法比其他聚类集成算法具有更好的效果。
To improve the clustering performance of a single clustering algorithm,this paper proposed an approach of the XML document clustering ensemble algorithm based on quantum genetic algorithm.Firstly,it divided the XML document into the difference of the k-members clustering using the K-nearest neighbour classifier algorithm.Next according to the relationship between the clustering members of the datasets was obtained Co-occurrence similarity matrix,and through a multi-segment and upward and downward double-direction shrink QR algorithm decomposition a large-scale matrix of eigenvalue to achieve the corresponding eigenvector matrix of dimensionality reduction.Finally in mapping space,using the quantum genetic algorithm to complete clustering ensemble,and discriminate the optimal clustering category from each sample.For to do it that would be reduced the data differences on the impact of clustering effects,and improved the clustering quality.Experiments on real-world data sets indicate that it has better clustering effects than clustering ensemble algorithms.
出处
《计算机应用研究》
CSCD
北大核心
2012年第6期2200-2204,共5页
Application Research of Computers
基金
国家教育部博士点基金资助项目(200805321029)
湖南省自然科学基金资助项目(07JJ6139)
关键词
XML文档
KNN分类
量子遗传算法
聚类集成
聚类质量
XML document
K-nearest neighbor partitioning
quantum genetic algorithm
cluster ensemble
clustering quality