期刊文献+

一种修正的向量空间模型在信息检索中的应用 被引量:6

Application of a modified vector space model in textual information retrieval systems
下载PDF
导出
摘要 为了提高文本信息检索系统检索性能,针对信息检索系统中普遍使用的向量空间模型(VSM)所固有的缺陷,提出一种新的修正的向量空间模型(MVSM).该模型重新定义了查询索引项的内容,将修饰词与中心词组成的合成短语引入到查询语句及传统的向量空间检索模型的信息表示中,并重新计算作为特征索引项的合成短语的权重值.在此基础上,又对查询索引项使用了基于同义词词典的查询扩展策略.实验结果表明:用合成短语作为查询索引项进行检索,使检索能够在相对精确的范围内进行,提高检索查准率;对查询进行同义扩展,能够使更多的语义相关的文本被检索出来,提高检索查全率.因此,在信息检索系统中应用修正的向量空间模型能够较好地改善检索性能. To improve the efficiency of textual information retrieval systems, a new model named modified vector space model (MVSM) is proposed, which aims to the intrinsic limitations of the traditional vector space model (VSM). And in the new IR model, the integration of modification words and head words as a combined term was introduced into the representation of user queries and the traditional VSM. The way to calculate the weights of combined terms in vectors was presented as well. A new strategy for query expansion based on synonymy thesaurus was proposed in the new model. Experimental results show that by introducing of combined terms we can retrieve documents in a relatively narrow search space, and the retrieval precision is increased. Furthermore, by query expansion strategy we can extend the coverage of the retrieval to the related documents that do not necessarily contain the same terms as the given query, and the retrieval recall is increased. So applying the MVSM to the information retrieval system is capable of improving the retrieval performance both in precision and recall rates.
出处 《哈尔滨工业大学学报》 EI CAS CSCD 北大核心 2008年第4期666-669,共4页 Journal of Harbin Institute of Technology
基金 日本佳思腾株式会社资助
关键词 文本信息检索 向量空间模型 同义词词典 查询扩展 textual information retrieval vector space model Synonymy Thesaurus query expansion
  • 相关文献

参考文献11

  • 1KRAFT D H, PETRY F E. Fuzzy information systems: managing uncertainty in databases and informa- tion retrieval systems [ J ]. Fuzzy Sets and Systems, 1997, 90(2) : 183 - 191.
  • 2SALTON G, WONG A, YANG C S. A vector space model for automatic indexing [ J ]. Communications of the ACM, 1975, 18(11): 613-620.
  • 3Cognitive Science Laboratory, Princeton University. WordNet[ EB/OL]. [2003 - 10 - 10]. http://www. cogsci.princeton.edu/~ wn/.
  • 4JING Liping, HUANG Houkuan, SHI Hongbo. Improved feature selection approach TFIDF in text mining [ C ]//In Proceedings of 1st information conference on Machine Learning and Cybernetic. Beijing: [s. n. ] , 2002:944 -946.
  • 5陆玉昌,鲁明羽,李凡,周立柱.向量空间法中单词权重函数的分析和构造[J].计算机研究与发展,2002,39(10):1205-1210. 被引量:126
  • 6张仰森,徐波,曹元大.自然语言处理中的语言模型及其比较研究[J].广西师范大学学报(自然科学版),2003,21(A01):16-24. 被引量:11
  • 7KIM M C, CHOI K S. A comparison of collocation - based similarity measures in query expansion [ J ]. Information Processing and Management, 1999, 35 (1): 19-30.
  • 8MANDALA R, TOKUNAGA T, TANAKA H. Query expansion using heterogeneous thesauri [ J ]. Information Processing and Management, 2000, 36 (3) : 361 - 378.
  • 9QIU Y, FREI H. Concept based query expansion [ C]//In Proceedings of the 16^th Annual International ACM- SIGIR Conference on Research and Development in Information Retrieval. New York: ACM press, 1993. 160- 169.
  • 10[ s. n. ]. Text retrieval conference TREC [ EB/OL ]. [ 2003 - 04 - 08 ]. http ://trec. nist. gov/.

二级参考文献7

共引文献134

同被引文献70

引证文献6

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部