摘要
针对网络舆情分析的需求背景,研究了通过后缀树算法发现文本文档之间的公共短语串,按公共短语串实现文档聚类。网页文档的标题和摘要能代表文档的主要思想,应用后缀树算法实现对标题和摘要自动聚类,从而实现舆情信息自动聚类。
In answer to the requirement of internet opinions analysis,this paper discusses the STC algorithm for text clustering,in order to discover common phrases that can assign documents and form document clusters.Because web document titles and abstracts can express the main ideas,web document clusters are created by STC algorithm,and clusters of internet public opinions information are created by using this method.
出处
《河北科技大学学报》
CAS
2012年第1期65-68,共4页
Journal of Hebei University of Science and Technology
基金
河北省科技支撑计划项目(10213557)
关键词
网络舆情
后缀树算法
文本聚类
internet public opinions
STC algorithm
text clustering