期刊文献+

一种基于内容权值的结构化文档检索方法

Method to Query Structured Document Based on Content Weight
下载PDF
导出
摘要 结构化文档由标题、章节、段落等逻辑结构组成。利用文本文档中自然层次结构的优点,提出了一种新的相似度查找方法,用问题回答系统来实现。主要任务就是从底层的结构化文档集合中找到用户需要的最合适的答案。这种方法可以提供多粒度的文档内容的检索,包括从单词、短语到段落或者章节。用微软百科全书Encarta作为测试集,实验结果表明本文的方法能获得更加准确、简短的答案,同时提供更多关于问题的上下文信息,从而更好地理解答案。 Structured documents consist of a few logical components, such as title, sections, subsections and paragraphs. A new method was proposed for approximate search by taking advantage of the natural hierarchical structure in text documents. This method is implemented in a question answering system. The main task of a question-answer system is to locate the most matching answer from the underlying structured document collection. This retrieval technique allows users to retrieve document components with varying granularity. The proposed method is evaluated on the Encarta encyclopedia document set. Experimental results expatiate that our method can produce more accurate results and shorter answers than traditional document retrieval, at the same time, can provide much more related context information about fuzzy questions so that users can understand the answer better.
出处 《辽宁工学院学报》 2004年第6期18-21,共4页 Journal of Liaoning Institute of Technology(Natural Science Edition)
关键词 结构化文档 基于内容 文本文档 文档内容 权值 相似度 上下文 答案 检索方法 百科全书 document database information retrieval passage retrieval structured document
  • 相关文献

参考文献8

  • 1Extensible Markup Language (XML)[DB/OL]. http://www.w3c.org/XML/. 2000-04.
  • 2Kaszkiel M, Zobel J, Sacks-Davis R. Efficient passage ranking for document databases[J]. ACM Transactions on Information System, 1999,17(4):406-439.
  • 3Clarke CLA, Cormack GV. Shortest-Substring retrieval and ranking[J]. ACM Transactions on Information System, 2000,18(1):44-78.
  • 4Cooper RJ, Rijger SM. A simple question answering system[A]. Proceedings of the TREC-9[C]. NIST Special Publication, 2000. http://www.doc.ic.ac.uk/~srueger/index.html.
  • 5Mchugh J, Widom J. Query optimization for XML[A]. Proceedings of the 25th International Conference on Very large Data Bases[C]. Edinburgh, Scotland, 1999. 315-326.
  • 6Goldman R, McHugh J, Widom J. From semistructured data to XML: Migrating the lore data model and query language[A]. Proceedings of the 2nd International Workshop on the Web and Databases (WebDB'99)[C]. Philadelphia, 1999. 25-30.
  • 7XML query[EB/OL]. http://www.w3c.org/XML/Query. 2000-04.
  • 8Wang XL, Wen JR, Liu WY, Dong YS. Enhancive index for structured document retrieval[A]. Proceedings of the12th International Workshop on Research Issues on Data Engineering: Engineering E-Commerce/E-Business Systems (RIDE-2EC 2002[C], Workshop of ICDE 02). San Jose, California: IEEE, 2002. 34-38.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部