摘要
结构化文档由标题、章节、段落等逻辑结构组成。利用文本文档中自然层次结构的优点,提出了一种新的相似度查找方法,用问题回答系统来实现。主要任务就是从底层的结构化文档集合中找到用户需要的最合适的答案。这种方法可以提供多粒度的文档内容的检索,包括从单词、短语到段落或者章节。用微软百科全书Encarta作为测试集,实验结果表明本文的方法能获得更加准确、简短的答案,同时提供更多关于问题的上下文信息,从而更好地理解答案。
Structured documents consist of a few logical components, such as title, sections, subsections and paragraphs. A new method was proposed for approximate search by taking advantage of the natural hierarchical structure in text documents. This method is implemented in a question answering system. The main task of a question-answer system is to locate the most matching answer from the underlying structured document collection. This retrieval technique allows users to retrieve document components with varying granularity. The proposed method is evaluated on the Encarta encyclopedia document set. Experimental results expatiate that our method can produce more accurate results and shorter answers than traditional document retrieval, at the same time, can provide much more related context information about fuzzy questions so that users can understand the answer better.
出处
《辽宁工学院学报》
2004年第6期18-21,共4页
Journal of Liaoning Institute of Technology(Natural Science Edition)