摘要
文本进行分词及去除停用词处理,然后使用词频——逆文档频率(TF-IDF)权值计算方法将数处理的内容向量化,然后采用K-Means聚类算法实现文本聚类。实验结果表明,该方法能够有效对文献进行梳理,使读者更有效的获取同一类型的文献,具有一定的使用价值和应用前景。
In this paper,the classical clustering algorithm K-Means is used to cluster the documents based on the summary of the documents.First,the text of the document summary is processed by word segmentation and removing stop words,then the content of the number processing is vectorized by the weight of word frequency-inverse document frequency(TF-IDF),and then the text clustering is realized by K-means clustering algorithm.The experimental results show that this method can sort out the literature effectively,and make the reader obtain the same type of literature more effectively,which has a certain value and application prospect.
作者
宋宏标
Song Hongbiao(Information College,Guizhou University of Finance and Economics,Guian,Guizhou 550025)
出处
《贵图学苑》
2021年第2期61-63,共3页
Guizhou Library Publication