摘要
在当今的时代,网络文本的数量正在迅速增长。文本的分析仍然是当今一个热门话题。由于传统的VSM模型在使用时会出像数据的纬度太高,缺乏对潜在语意理解等问题,导致最终的聚类结果的精确度不够高。鉴于此,提出了VSM和LDA的文本聚类的混合模型,通过对文本的处理,筛选,统计的方法得到特征词权重的集合,随后分别计算VSM模型和LDA模型相似度,通过将两个相似度进行线性相加的方法相结合,得到混合相似度,然后通过K-means算法进行文本聚类,分别得到VSM模型、LDA模型和混合模型的聚类结果,通过统计与分析得到最后的实验结果。实验结果表明,该混合模型是有效。
In today's world, the number of online texts is rapidly growing. Text analysis is still a hot topic today. As the traditional VSM model will appear as the data latitude is too high, the lack of potential semantic understanding and other issues, resulting in the accuracy of the final clustering results are not high enough. In view of this, a hybrid model of text clustering of VSM and LDA is proposed, and a set of feature weights is obtained through text processing, screening and statistics, and then the similarity between the VSM model and the LDA model is calculated respectively. Degree of linear sum of the method to get mixed similarity, and then through the K-means algorithm text clustering, respectively, the VSM model, LDA model and the hybrid model of the clustering results obtained by statistical analysis and the final experimental results. Experimental results show that the hybrid model is effective.
作者
刘晓蒙
熊海涛
LIU Xiao-meng, XIONG Hai-tao (Beijing Technology and Business University, Beijing 100000,China)
出处
《电脑知识与技术》
2018年第1期35-38,共4页
Computer Knowledge and Technology
基金
北京市自热科学基金(4172014)