摘要
为了提高网络舆情监测的时效性与准确性,改进了Nutch信息采集的方法,加入了URL分析、判重、页面时限等功能,提高了舆情采集的效率。利用《综合电子政务主题词表》构建了舆情监测领域本体,加入了辖区知识进行了扩展,提出了语义扩展的舆情监测算法。实验结果表明,舆情监测的准确率和召回率都有明显提高。
In order to get the Internet public opinion timely and comprehensively,this paper proposed a method of information collection by Nutch,it has many functions such as the URL analysis,reduplicate,page time limited,etc.And it improved the efficiency of the information collection.Domain ontology for public opinion detection was constructed by Integrated E-Government Thesaurus(Category Table),and extended it with area knowledge.An algorithm for public opinion detection was introduced by words semantic expansion.The experiment shows that precision and recall rate has increased significantly.
出处
《西南科技大学学报》
CAS
2012年第3期92-96,101,共6页
Journal of Southwest University of Science and Technology
基金
四川省教育厅资助项目(编号:12ZB326)
西南科技大学重点项目(编号:10zxwk03)
关键词
网络舆情
领域本体
信息采集
主题词表
Internet public opinion
Domain ontology
Information collection
Thesaurus