摘要
超图聚类方法是目前主流聚类方法之一.它的经典版本出现在超大规模集成电路研究领域.近年来,它的各种改进版本被提出并广泛应用于机器视觉领域.例如,在图像聚类和运动分割方面,它的各种版本常有较好的表现.本文将超图聚类方法引入文本聚类领域.首先,根据文本数据高度稀疏的特点,采用SVD(或PCA)对其进行降维;其次,采用基于大超边的超图规范割聚类对文本的低维投影进行聚类;最后,采用聚类准确率指标对聚类进行评价.在两个文本数据集的实验中,基于超图规范割聚类取得了比传统的k均值聚类和层次聚类更好的聚类表现.
Hypergraph based clustering is one of the most popolar clustering methods at present.Its typical version is proposed firstly in the field of Very Large Scale Integration Circuit,while its various generalized versions are applied conprehensively in machine vision in recent years.For example,in image clustering and motion segmentation,its various versions often achieve good performance.In this paper,hypergraph baded clustering is introuduced to cluster text.Firstly,based on high sparsity of text data,we use SVD or PCA to decrease their dimensions,then we cluster the lower dimensional text using hypergraph normalized cut clustering in the large hyperedge case;finally,we evaluate the clustering perfomance using the index of accuracy.In the experiments on two text datasets,the method based on hypergrah normalized cut gets the best clustering accuracy comparing to the typical k means method and the hierarchical clustering method.
作者
檀敬东
TAN Jing-dong(School of Mathematics,Hefei University of Technology,Hefei 230009, China)
出处
《大学数学》
2017年第6期33-36,共4页
College Mathematics
基金
国家自然科学基金项目(61503115)
关键词
超图
规范割
文本聚类
随机聚类模型
hypergraph
normalized cut
text clustering
random cluster models