期刊文献+

文档检索中句法信息的有效利用研究 被引量:4

Effectiveness of Syntactic Relationship in Document Retrieval
下载PDF
导出
摘要 利用词项依存关系来改进词袋模型,一直是文本检索中一个热门话题。已有的定义词项依存的方法中,有两类主要的方法:一类是词汇层次的依存关系,利用统计近邻信息来定义词项依存关系,另一类是句法层次的依存关系,由句法结构来定义词项依存关系。虽然已有的研究表明,相对于词袋模型,利用词项依存关系能够显著地提高检索性能,但这两类词项依存关系却缺乏系统的比较:在利用词项依存关系来改进文档和查询的表达上,如何有效地利用句法信息,哪些句法信息对文本检索比较有效,依然是个有待研究的问题。为此,在文档表达上,比较了利用近邻信息和句法信息定义的词项依存关系的性能;在查询表达上,对利用不同层次的句法信息所定义的词项依存关系的性能进行了比较。为了系统地比较这些词项依存关系对检索性能的影响,在语言模型基础上,以平滑为思路,提出了一个能方便融入这两类词项依存关系的检索模型。在TREC语料上的实验表明,对于文档表达来说,句法关系较统计近邻关系没有明显的差别。在查询表达上,基于名词/专有词短语的部分句法信息较其他的句法信息更加有效。 To relax the term independence assumption, term dependency is introduced and it has improved retrieval precision dramatically. There are two kinds of terra dependencies: one is defined by terra proximity, and the other is defined by syntactic dependencies. In: this paper, we take a comparative study to re-examine these two kinds of term dependencies in dependence language model framework and presents a smooth-based dependence language model. We studied the effectiveness of syntactic dependencies in query representation and document representation respectively. The experimental results on TREC collections show: 1) Syntactic dependencies get a better result than term proximity in document representation. 2) In: query representation, concept-based part syntactic dependencies are more effective than other syntactic dependencies.
出处 《中文信息学报》 CSCD 北大核心 2008年第4期66-74,共9页 Journal of Chinese Information Processing
基金 国家973重点基础研究资助项目(2004CB318109) 国家自然科学基金资助项目(60603094) 北京市科技计划(D0106008040291)
关键词 计算机应用 中文信息处理 信息检索 词项依存 句法分析 词项近邻 computer application Chinese information processing information retrieval terra dependency syntactic parsing term proximity
  • 相关文献

参考文献22

  • 1Carmen Alvarez, Philippe Langlais, Jian-Yun Nie,Word Pairs in Language Modeling for Information Retrieval [C]//Proceedings of RIAO 2004, 2004. 686- 705.
  • 2C. J. van Rijsbergen. Information Retrieval. Butter worths [M]. 1979.
  • 3C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval [C]//Proceedings of SIGIR'01, 2001. 334- 342.
  • 4D. Lin, Dependency-based Evaluation of MINIPAR [C]//Proceedings of Workshop on the Evaluation of Parsing Systems, Granada, Spain: May, 1998.
  • 5Donald Metzler and W. Bruce Croft, A Markov random field model for term dependencies [C]//Proceedings of SIGIR'05, 2005 : 472-479.
  • 6F. Song and W. B. Croft. A general language model for information retrieval[C]//Proceedings of SIGIR' 99, 1999: 279-280.
  • 7Hays, D. (1964). Dependency theory: a formalism and some observations [J]. Language, 40:511-525.
  • 8Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu and Guihong Cao, Dependence Language Model for Information Retrieval [C ]//Proceedings of SIGIR ' 04, 2004:170-177.
  • 9Jianfeng Gao, Haoliang Qi, Xinsong Xia and Jian-Yun Nie. Linear Discriminant Model for Information Retrieval [C]//Proceedings of SIGIR' 05, 2005:290- 297.
  • 10J. Robert M. Losee. Term dependence: Truncating the Bahadur Lazarsfeld expansion [C]. Information Processing and Management, 30 (2) : 293-303, 1994.

二级参考文献59

  • 1苏祺,昝红英,胡景贺,项锟.词性标注对信息检索系统性能的影响[J].中文信息学报,2005,19(2):58-65. 被引量:8
  • 2赵军,金千里,徐波.面向文本检索的语义计算[J].计算机学报,2005,28(12):2068-2078. 被引量:28
  • 3金澎,刘毅,王树梅.汉语分词对中文搜索引擎检索性能的影响[J].情报学报,2006,25(1):21-24. 被引量:6
  • 4Strzalkowski T.Natural Language Information Retrieval.The Netherlands:Kluwer Academic Publishers,1999
  • 5Sanderson M.Word sense disambiguation and information retrieval [Ph.D.dissertation].Department of Computing Science,University of Glasgow,UK,1996
  • 6Salton G.,Buckley C.Term-weighting approaches in automatic text retrieval.Information Processing & Management,1988,24(5):513~523
  • 7Church K.W.,Gale W.A.Inverse document frequency(IDF):A measure of deviations from Poisson.In:Proceedings of the 3rd Workshop on Very Large Corpora,Boston,MA,USA,1995,121~130
  • 8Singhal Amit,Buckley Chris,Mitra Mandar.Pivoted document length normalization.In:Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,Switzerland,1996,21 ~ 29
  • 9Robertson S.E.,Walker S.Okapi/Keenbow at TREC-8.In:Proceedings of the 8th Text Retrieval Conference(TREC-8),NIST Special Publication,Gaithersburg,MD,USA,1999,500~246
  • 10Deerwester S.,Dumais S.T.,Furnas G.W.,Landauer T.K.,Harshman R.Indexing by latent semantic analysis.Journal of the American Society for Information Science,1990,41(6):391~407

共引文献76

同被引文献46

引证文献4

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部