一种基于内容权值的结构化文档检索方法

Method to Query Structured Document Based on Content Weight

下载PDF

导出

摘要结构化文档由标题、章节、段落等逻辑结构组成。利用文本文档中自然层次结构的优点,提出了一种新的相似度查找方法,用问题回答系统来实现。主要任务就是从底层的结构化文档集合中找到用户需要的最合适的答案。这种方法可以提供多粒度的文档内容的检索,包括从单词、短语到段落或者章节。用微软百科全书Encarta作为测试集,实验结果表明本文的方法能获得更加准确、简短的答案,同时提供更多关于问题的上下文信息,从而更好地理解答案。 Structured documents consist of a few logical components, such as title, sections, subsections and paragraphs. A new method was proposed for approximate search by taking advantage of the natural hierarchical structure in text documents. This method is implemented in a question answering system. The main task of a question-answer system is to locate the most matching answer from the underlying structured document collection. This retrieval technique allows users to retrieve document components with varying granularity. The proposed method is evaluated on the Encarta encyclopedia document set. Experimental results expatiate that our method can produce more accurate results and shorter answers than traditional document retrieval, at the same time, can provide much more related context information about fuzzy questions so that users can understand the answer better.

作者范轶彦朱利群郭国强

机构地区湖南文理学院计算机科学与技术系

出处《辽宁工学院学报》 2004年第6期18-21,共4页 Journal of Liaoning Institute of Technology(Natural Science Edition)

关键词结构化文档基于内容文本文档文档内容权值相似度上下文答案检索方法百科全书 document database information retrieval passage retrieval structured document

分类号 T-652 [一般工业技术] TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1Extensible Markup Language (XML)[DB/OL]. http://www.w3c.org/XML/. 2000-04.
2Kaszkiel M, Zobel J, Sacks-Davis R. Efficient passage ranking for document databases[J]. ACM Transactions on Information System, 1999,17(4):406-439.
3Clarke CLA, Cormack GV. Shortest-Substring retrieval and ranking[J]. ACM Transactions on Information System, 2000,18(1):44-78.
4Cooper RJ, Rijger SM. A simple question answering system[A]. Proceedings of the TREC-9[C]. NIST Special Publication, 2000. http://www.doc.ic.ac.uk/～srueger/index.html.
5Mchugh J, Widom J. Query optimization for XML[A]. Proceedings of the 25th International Conference on Very large Data Bases[C]. Edinburgh, Scotland, 1999. 315-326.
6Goldman R, McHugh J, Widom J. From semistructured data to XML: Migrating the lore data model and query language[A]. Proceedings of the 2nd International Workshop on the Web and Databases (WebDB'99)[C]. Philadelphia, 1999. 25-30.
7XML query[EB/OL]. http://www.w3c.org/XML/Query. 2000-04.
8Wang XL, Wen JR, Liu WY, Dong YS. Enhancive index for structured document retrieval[A]. Proceedings of the12th International Workshop on Research Issues on Data Engineering: Engineering E-Commerce/E-Business Systems (RIDE-2EC 2002[C], Workshop of ICDE 02). San Jose, California: IEEE, 2002. 34-38.

1王晓玲,文继荣,栾金锋,马维英,董逸生.一种通过内容和结构查询文档数据库的方法(英文)[J].软件学报,2003,14(5):976-983. 被引量：9
2微软悄悄关闭ENCARTA百科[J].电脑爱好者,2009(9):60-61.
3钱杰,张健,高乐.Web结构挖掘中的PageRank算法改进[J].计算机系统应用,2008,17(7):42-45. 被引量：1
4许丞,彭瀚,马龙,李双峰.AskTheWeb——一个基于Web的问题回答原型系统[J].华南理工大学学报（自然科学版）,2004,32(z1):11-17. 被引量：1
5网络时代的悲哀:微软关停大百科全书Encarta[J].计算机与网络,2009,35(8):6-6.
6Jim Seymour.追求永无止境[J].个人电脑,2000,6(11):70-70.
7MSN上练英语学百科[J].科技展望（幻想大王）,2006(22):20-20.
8八通天.微软最新百科全书系列Encarta 2001[J].广东电脑与电讯,2001(4):82-83.
9曹均阔,黄萱菁.基于依赖关系的定义类问题回答系统[J].自动化学报,2009,35(11):1429-1435.
10林旭东,彭宏,郑启伦,陈绍坚.基于Web的中文开放式问题回答系统[J].计算机科学,2006,33(5):211-213.

辽宁工学院学报

2004年第6期

浏览历史

内容加载中请稍等...

一种基于内容权值的结构化文档检索方法

参考文献8

相关作者

相关机构

相关主题

浏览历史