期刊文献+

基于结点权重模型的XML片段检索策略 被引量:5

A Snippet Retrieval Strategy Based on Element Weighting Model
下载PDF
导出
摘要 当用户向XML检索引擎提交查询后,返回的结果通常远远多于用户的期望,返回结果中难免有一些不相关的文档或结点.对于以文档为中心的XML文档集合,XML片段检索是根据用户的查询,从XML检索引擎返回的XML文档或结点中抽取出仅包含数百字节的片段,用户可以通过该片段判断片段所在的XML文档或结点与查询的真实相关性,以决定是否有必要进一步阅读,从而有效地提高从XML文档中获取信息的效率.该文提出了基于结点权重模型的XML片段检索策略.该策略先利用结点权重模型ATG(平均主题概括强度)对XML文档集中的标签或路径设置权重,再将该权重用于BM25模型,得到BM25NW检索模型.在利用BM25NW检索出XML结点后,对结点中定长窗口进行评分,考察其是否适合作为片段内容.最后在保证信息冗余较小的条件下,选择得分较高的窗口内容组成片段返回给用户.INEX 2011片段检索任务上的评测结果显示,基于结点权重模型ATG的XML片段检索策略具有很强的竞争力,性能明显优于其它参赛系统. In XML information retrieval,queries on XML search engines usually return far more results than the user expects and in which there lay lots of irrelevant results.As to a document-centric XML collection,the goal of XML snippet retrieval is to generate a snippet containing only hundreds of characters for each result returned by the XML search engine.Such snippet can provide sufficient information to allow the users to determine the relevance of its underlying document,instead of reading the document itself,which can help the users find what they want quickly.In this paper,a snippet retrieval strategy based on an element weighting model is proposed.In this strategy,all elements in an XML document are weighted automatically by Average Topic Generalization(ATG) model.Then the BM25EW model,which is obtained by applying element weights on BM25 model,is employed to retrieve and rank the relevant elements in an XML document collection.To extract a suitable snippet,all retrieved elements are split into some windows with the same length,which are then assessed.The windows with higher scores are extracted as snippets with the consideration that the redundancy is as little as possible.The experimental results on INEX 2011 Snippet Retrieval Track show that snippet retrieval strategy based on element weighting model ATG is competitive,and performs better than other participants.
出处 《计算机学报》 EI CSCD 北大核心 2013年第8期1729-1744,共16页 Chinese Journal of Computers
基金 国家自然科学基金(60803105 61173146) 国家社会科学基金(12CTQ042) 江西省高等学校科技落地计划项目(KJLD12022) 江西省教育厅科学技术研究项目(赣教技字11731号)资助~~
关键词 XML片段检索 结点权重模型 平均主题概括强度 窗口 XML snippet retrieval element weighting model average topic generalization window
  • 相关文献

参考文献39

  • 1万常选,鲁远.基于权重查询词的XML结构查询扩展[J].软件学报,2008,19(10):2611-2619. 被引量:21
  • 2Chowdhury M, Thomo A, Wadge W. Preferential infinitesi- mals for information retrieval//Proceedings of the 5th IFIP Conference on Artificial Intelligence Applications and Innova- tions. Thessaloniki, Greece, 2009 : 113-125.
  • 3刘德喜,万常选,刘喜平,焦贤沛.XML检索中的标签权重设置模型[J].计算机科学与探索,2010,4(8):723-730. 被引量:1
  • 4Liu D, Wan Ch, Chen L, Liu X. Automatically weighting tags in XML collection//Proceedings of the 19th ACM International Conferences on Information and Knowledge Management. Toronto, Canada, 2010:1289-1292.
  • 5万常选,刘喜平.XML数据库技术.第2版.北京:清华大学出版社,2005.
  • 6Singhal A, Choi J, Hindle D, et al. ATb-T at TREC 7// Proceedings of the 7th Text REtrieval Conference, Gaithersburg, Maryland, USA, 1999: 239-252.
  • 7Husbands P, Simon H, Ding C. On the use of the singular value decomposition for text retrieval//Berry M. Computa- tional Information Retrieval. USA: Society for Industrial and Applied Mathematics Philadelphia, 2001:145-156.
  • 8Robertson S, Walker S, Hancock-Beaulieu M. Okapi at TREC-7: Automatic ad hoc, filtering, VLC and interactive tracks//Proceedings of the 7th Text REtrieval Conference, Gaithersburg, Maryland, USA, 1999:253-264.
  • 9Trappett M, Geva S, Trotman A, et al. Overview of the INEX 2011 snippet retrieval track//Proceedings of the 10th International Workshop of the Initiative for the Evaluation of XML Retrieval. Dagstuhl, Germany, 2011: 228-237.
  • 10Leal L, Scholer F, Thorn J. RMIT at INEX 2011 snippet retrieval track//Proceedings of the 10th International Work shop of the Initiative for the Evaluation of XML Retrieval. Dagstuhl, Germany, 2011:240-243.

二级参考文献30

共引文献22

同被引文献58

引证文献5

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部