期刊文献+

基于多视图的文本聚类改进方法 被引量:3

An improvement of text clustering method based on multi-view
下载PDF
导出
摘要 近年来,随着自然语言处理技术的发展,聚类技术在文本处理领域中的作用愈发凸显。目前,国内多视图文本聚类的相关研究进展仍处于起步阶段,通常运用的聚类方法是基于文本的单一领域来展现特定方面的聚类情况,但越来越多的文本聚类研究从单视图向多视图的方向转变。提出了一种以LDA主题模型和TF-WIDF特征提取算法作为特征向量组,基于谱聚类的改进型多视图半监督文本聚类方法。该方法基于半监督的协同训练(Co-training)算法,通过对协同训练算法中的文本标记方式进行改进,实现无监督性质的多视图协同训练算法。实验结果表明:改进算法相较于传统单视图文本聚类算法,很大程度上避免了单视图算法的偶然性和局限性,提高了文章整体聚类的准确度。 In recent years,with the development of natural language processing technology,clustering technology plays a more and more important role in the field of text processing.At present,domestic research progress on multi-view text clustering is still in its initial stage.In generally,clustering methods are based on the single view field of text to show the clustering situation of specific aspects,but more and more text clustering research has changed from single-view to multi-view.In this paper,we propose an improved multi-view semi-supervised text clustering method based on spectral clustering with LDA topic model and TF-WIDF feature extraction algorithm as feature vector group.This method is based on Co-training.By improving the text labeling method in Co-training algorithm,the unsupervised multi-view cooperative training algorithm is realized.The experimental results show that compared with the traditional single-view text clustering algorithm,the improved algorithm greatly avoids the contingency and limitation of the single-view algorithm,and improves the accuracy of the overall clustering of articles.
作者 王卫红 李樊 金凌剑 WANG Weihong;LI Fan;JIN Lingjian(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)
出处 《浙江工业大学学报》 CAS 北大核心 2021年第1期1-8,共8页 Journal of Zhejiang University of Technology
基金 浙江省自然科学基金资助项目(LZ14F020001)。
关键词 文本聚类 LDA TF-WIDF CO-TRAINING 谱聚类 text clustering LDA TF-WIDF Co-training spectral clustering
  • 相关文献

参考文献6

二级参考文献46

共引文献235

同被引文献49

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部