摘要
本研究针对舆情信息源特征就舆情信息增量采集、提取和存储、文本信息预处理提出了基于Web-Harvest的定点信息采集以及基于输入法平台的新词收集策略,构建了一个互联网用语扩展词库,实现了信息预处理关键模块。
According to the characteristics of the online public opinion information resources, in order to realize the information incremental acquisition, information extraction and storage, and text preprocessing, a strategy of information collection basing on the Web-Harvest, and a strategy of new word collection basing on the input platform were proposed in this study. A expansion thesaurus of internet terms was build, and the information pre-processing module was achieved.
出处
《图书情报知识》
CSSCI
北大核心
2011年第6期50-54,共5页
Documentation,Information & Knowledge
基金
广东省教育厅产学研合作专项资金项目"网络舆情智能监测与分析系统"(2007A090302027)成果之一