摘要
为了提高文本信息检索系统检索性能,针对信息检索系统中普遍使用的向量空间模型(VSM)所固有的缺陷,提出一种新的修正的向量空间模型(MVSM).该模型重新定义了查询索引项的内容,将修饰词与中心词组成的合成短语引入到查询语句及传统的向量空间检索模型的信息表示中,并重新计算作为特征索引项的合成短语的权重值.在此基础上,又对查询索引项使用了基于同义词词典的查询扩展策略.实验结果表明:用合成短语作为查询索引项进行检索,使检索能够在相对精确的范围内进行,提高检索查准率;对查询进行同义扩展,能够使更多的语义相关的文本被检索出来,提高检索查全率.因此,在信息检索系统中应用修正的向量空间模型能够较好地改善检索性能.
To improve the efficiency of textual information retrieval systems, a new model named modified vector space model (MVSM) is proposed, which aims to the intrinsic limitations of the traditional vector space model (VSM). And in the new IR model, the integration of modification words and head words as a combined term was introduced into the representation of user queries and the traditional VSM. The way to calculate the weights of combined terms in vectors was presented as well. A new strategy for query expansion based on synonymy thesaurus was proposed in the new model. Experimental results show that by introducing of combined terms we can retrieve documents in a relatively narrow search space, and the retrieval precision is increased. Furthermore, by query expansion strategy we can extend the coverage of the retrieval to the related documents that do not necessarily contain the same terms as the given query, and the retrieval recall is increased. So applying the MVSM to the information retrieval system is capable of improving the retrieval performance both in precision and recall rates.
出处
《哈尔滨工业大学学报》
EI
CAS
CSCD
北大核心
2008年第4期666-669,共4页
Journal of Harbin Institute of Technology
基金
日本佳思腾株式会社资助
关键词
文本信息检索
向量空间模型
同义词词典
查询扩展
textual information retrieval
vector space model
Synonymy Thesaurus
query expansion