摘要
针对Web新闻信息结构和内容特征,在分析了传统的向量空间模型存在不足的基础上,提出了根据特征词进行语义分组的向量空间模型。该模型将一篇新闻报道中的特征词从语义上划分为相对独立的4个组:时间、地点、人物和事件,进而形成了4个向量空间,并对每个向量空间进行特征项权值和相似度的计算。理论分析和实验结果表明,改进后的模型更适应Web新闻信息的检索,使查准率、查全率和查询速度都有所提高。
Based on the structural and content feature of Web news information and the analysis of the insufficiency of the traditional vector space model,this paper proposes an improved vector space model that the semantic group is formed according to the characteristic word.This model divides the characteristic word of a report into four relatively independent groups according to the semantic meaning:time,place,character and event,and thus forms four vector spaces.Then the characteristic weight and the similarity to each vector space are calculated.Theoretical analysis and the experimental results show that the improvement of the model adapts better to Web news information retrieval,thus improving the precision,recall and computation speed.
出处
《电子科技》
2011年第4期24-26,共3页
Electronic Science and Technology
关键词
向量空间模型
语义分组
信息检索
查准率
查全率
vector space model
semantic group
information retrieval
precision
recall