摘要
针对互联网大量突发事件新闻语料的标题和正文分别进行预处理,建立特征向量库,利用决策表规则和最短向量距离相结合的匹配方法对文本的主题进行双重识别,从而更好地服务于互联网突发事件自动识别。
The paper focuses on a large number of news corpus, pretreats the titles and abstracts of training documents, then builds up the feature vector library. At last, it uses matching method of decision table rules and vector space method to identificate the articles in two ways, and makes better service of the sudden events recognition on Web.
出处
《现代图书情报技术》
CSSCI
北大核心
2010年第10期65-69,共5页
New Technology of Library and Information Service
基金
南京农业大学SRT计划项目"基于规则与统计相结合的互联网突发事件识别研究"(项目编号:0913A11)的研究成果之一
关键词
突发事件识别
规则识别
统计识别
Sudden event recognition Rule -based recognition Statistics -based recognition