期刊文献+

信息检索中一种基于词语—主题词相关度的语言模型 被引量:3

A Term-Subject-Association-Based Language Model for Information Retrieval
下载PDF
导出
摘要 本文提出一种基于词语-主题词相关关系的语言模型TSA-LM(Term-Subject Association Based Language Model),它的基本思想是把一篇文档分成两个文档块,一部分是由领域主题词表中的主题词构成的主题词文档块,另一部分是由非主题词构成的非主题词文档块,分别计算两个文档块和查询的似然程度。对非主题词文档块,假设词语间独立无关,沿用经典的语言模型计算;对主题词文档块,把查询词语和主题词相关关系引入语言模型中来估计该文档块和查询的似然程度。词语-主题词相关关系采用词语-主题词相关度来衡量。词语-主题词相关度的计算除了来源于对文档中词语-主题词共现性的观察外,还来源于宏观上对词语-文档-主题词归属关系的观察。公开数据集上的检索实验结果表明,基于词语-主题词相关关系的语言模型可以有效提高检索效果。 We propose a Term-Subject-Association-based Language Model (TSA-LM) for document retrieval. Its main idea is to divide a document into two parts: one is only composed of subject words (named as subject block), and the other contains no subject words (named as non-subject block). Query-likelihood of a document is measured by the combination of the query-likelihood of the two blocks. For non-subject block, we follow classical language model. For subject block, we use the language model smoothed by term-subject association. The term-subject association is weighted by term-subject co-occurrence and term-document-subject labeling relationship. The experimental results on public dataset show that TSA-LM improves search effectiveness.
出处 《中文信息学报》 CSCD 北大核心 2007年第6期43-51,共9页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60496325 60573092) 国家科技攻关课题资助(2005BA112A02)
关键词 计算机应用 中文信息处理 语言模型 主题词 词语-主题词相关关系 词语 文档-主题词归属关系 词语 主题词共现关系 computer application Chinese information processing language model subject word term-subject association term-document-subject attachment term-subject co-occurrence
  • 相关文献

参考文献8

  • 1Jay M. Ponte, Croft W. Bruce. A Language Modeling Approach to Information Retrieval [A]. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval[C]. Melbourne, Australia: ACM Press, 1998.
  • 2John Lafferty, Chengxiang Zhai. Document Language Models, Query Models, and Risk Minimization for Information Retrieval [A]. In: Proceedings of the 24th annual international ACM search and development in SIGIR conference on Reinformation retrieval [C].New Orleans, Louisiana, United States : ACM Press, 2001.
  • 3Renxu Sun, Chai-Huat Ong, Tat-Seng Chua L. Mining Dependency Relations for Query Expansion in Passage Retrieval[A].In: SIGIR'06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval [C]. New York, NY, USA, 2006, 382-389.
  • 4鲁松,白硕.自然语言处理中词语上下文有效范围的定量描述[J].计算机学报,2001,24(7):742-747. 被引量:47
  • 5Hui Fang, Tao Tao, ChengXiang Zhai. A Formal Study of Information Retrieval Heuristics [A]. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval[C]. Sheffield, United Kingdom: ACM Press, 2004.
  • 6http://www. ischol. berkeley. edu/- hearst/irbook/cfc. html
  • 7Dragon Toolkit Homepage, http://www. ischool.drexel. edu/drnbio/dragontool/
  • 8.[EB/OL].http://trec.nist. gov/,(Accesse0May23,2005).

二级参考文献2

  • 1白硕,语言学知识的计算机辅助发现,1995年
  • 2方开泰,实用多元统计分析,1989年

共引文献47

同被引文献30

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部