期刊文献+

一种通过内容和结构查询文档数据库的方法(英文) 被引量:9

A Method to Query Document Database by Content and Structure
下载PDF
导出
摘要 文档是有一定逻辑结构的,标题、章节、段落等这些概念是文档的内在逻辑.不同的用户对文档的检索,有不同的需求,检索系统如何提供有意义的信息,一直是研究的中心任务.结合文档的结构和内容,对结构化 文件的检索,提出了一种新的计算相似度的方法.这种方法可以提供多粒度的文档内容的检索,包括从单词、短语到段落或者章节.基于这种方法实现了一个问题回答系统,测试集是微软的百科全书Encarta,通过与系统方法实验比较,证明通过这种方法检索的文章片断更合理、更有效. Structured documents are made up of a few logical components,such as title,sections,subsections and paragraphs.The components in each structured document can be represented by an ordered tree model,which can also be viewed as a hierarchical concept relationship.To meet the user抯 requirements for more precise and concentrated search results,the retrieval techniques should allow the user to retrieve document components with varying granularity.This paper presents a method to query document database by content and structure.The key idea is to construct a more comprehensive similarity function by taking advantage of the inherent hierarchical structure in documents.This work combines Information Retrieval techniques,semi-structured data query and proximate search for document documents.The proposed method is evaluated on the Encarta encyclopedia document set and the experimental results show that is can provice more accurate and focused answers than traditional document retrieval methods.
出处 《软件学报》 EI CSCD 北大核心 2003年第5期976-983,共8页 Journal of Software
基金 This work was performed while the first author was a visiting student at Microsoft Research Asia.
关键词 文档数据库 结构查询 结构化文档 计算相似度 document database information retrieval passage retrieval structured document
  • 相关文献

参考文献8

  • 1[1]Extensible Markup Language (XML) http://www.w3c.org/XML/.
  • 2[2]Kaszkiel M, Zobel J, Sacks-Davis R. Efficient passage ranking for document databases. ACM Transactions on Information System, 1999,17(4):406~439.
  • 3[3]Clarke CLA, Cormack GV. Shortest-Substring retrieval and ranking. ACM Transactions on Information System, 2000,18(1):44~78.
  • 4[4]Cooper RJ, Rijger SM. A simple question answering system. In: Proceedings of the TREC-9. NIST Special Publication, 2000. http ://www. doc. ic. ac .uk/~srueger/index .html
  • 5[5]McHugh J, Widom J. Query optimization for XML. In: Proceedings of the 25th International Conference on Very large Data Bases Edinburgh, Scotland, 1999. 315~326.
  • 6[6]Goldman R, McHugh J, Widom J. From semistructured data to XML: Migrating the lore data model and query language. In: Proceedings of the 2nd International Workshop on the Web and Databases(WebDB'99). Philadelphia, 1999.25~30.
  • 7[7]XML query. http://www.w3c.org/XML/Query.
  • 8[8]Wang XL, Wen JR, Liu WY, Dong YS. Enhancive index for structured document retrieval. In: Proceedings of thel 2th International Workshop on Research Issues on Data Engineering: Engineering ECommerce/E-Business Systems (RIDE-2EC 2002, Workshop of ICDE 02). San Jose, California: IEEE, 2002. 34~38.

同被引文献109

引证文献9

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部