摘要
舆情跟踪是对媒体信息流中的热点话题进行实时追踪,是近年来自然语言处理领域的研究热点。实现该任务的核心技术是进行文本分类,运用信息增益以及互信息计算特征项权重,提取向量空间模型中文档表示的有效特征;分别采用Rocchio、K-Nearest Neighbor(KNN)、Bayes方法对于给定主题的事件实现舆情跟踪。在测试集上的最优性能F-Measure值达到86.2%。舆情跟踪在信息安全等领域具有广阔的应用前景,为用户及时判断网络热点事件的发展趋势提供有效指导依据。
The aim of the public opinion tracking is to make tracks for the progress of the appointed hot topic in the information flow of the media, and this has becomes the hotspot research direction in the field of natural language processing in recent years. The key technique to achieve the task is text classification. The authors adopt different methods of information gain and mutual information for the feature selection within the vector space model. They are used for the weight calculation and the effective features with higher weight values are extracted. The approach of Rocchio, KNN and Bayes are adopted to implement the public opinion tracking on a given topic events. Finally, the authors give the statistical data analysis and achieve the performance of 86.2% F-Measure on the test set. Public opinion tracking has a broad application prospect in the areas of information security and so on. It provides the effective guidance for the determination to the development trend of the network hot events.
出处
《图书情报工作》
CSSCI
北大核心
2012年第18期50-53,109,共5页
Library and Information Service
基金
国家自然科学基金青年基金项目"问答式信息检索中信息抽取技术研究"(项目编号:60803086)
北京市自然科学基金项目"语义蕴涵推理技术及在问答式信息检索中的应用研究"(项目编号:4123091)研究成果之一
关键词
舆情跟踪
文本分类
自然语言处理
public opinion tracking
text classification
natural language processing