期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Automatic Question Answering from Web Documents 被引量:4
1
作者 LI Xin HU Dawei +3 位作者 LI Huan HAO Tianyong CHEN Enhon LIU Wenyin 《Wuhan University Journal of Natural Sciences》 CAS 2007年第5期875-880,共6页
A passage retrieval strategy for web-based question answering (QA) systems is proposed in our QA system. It firstly analyzes the question based on semantic patterns to obtain its syntactic and semantic information a... A passage retrieval strategy for web-based question answering (QA) systems is proposed in our QA system. It firstly analyzes the question based on semantic patterns to obtain its syntactic and semantic information and then form initial queries. The queries are used to retrieve documents from the World Wide Web (WWW) using the Google search engine. The queries are then rewritten to form queries for passage retrieval in order to improve the precision. The relations between keywords in the question are employed in our query rewrite method. The experimental result on the question set of the TREC-2003 passage task shows that our system performs well for factoid questions. 展开更多
关键词 question answering(QA) passage retrieval semantic pattern
下载PDF
Hierarchical Subtopic Segmentation of Web Document
2
作者 ZHANG Yun-tao GONG Ling WANG Yong-cheng 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期47-50,共4页
The paper proposes a novel method for subtopics segmentation of Web document. An effective retrieval results may be obtained by using subtopics segmentation. The proposed method can segment hierarchically subtopics an... The paper proposes a novel method for subtopics segmentation of Web document. An effective retrieval results may be obtained by using subtopics segmentation. The proposed method can segment hierarchically subtopics and identify the boundary of each subtopic. Based on the term frequency matrix, the method measures the similarity between adjacent blocks, such as paragraphs, passages. In the real-world sample experiment, the macro-averaged precision and recall reach 73.4 % and 82.5 %, and the micro-averaged precision and recall reach 72.9% and 83. 1%. Moreover, this method is equally efficient to other Asian languages such as Japanese and Korean, as well as other western languages. 展开更多
关键词 subtopic segmentation Web document passage retrieval DISCOURSE
下载PDF
Extracting Variable-Depth Logical Document Hierarchy from Long Documents:Method,Evaluation,and Application
3
作者 曹荣禹 曹逸轩 +1 位作者 周干斌 罗平 《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第3期699-718,共20页
In this paper,we study the problem of extracting variable-depth"logical document hierarchy"from long documents,namely organizing the recognized"physical document objects"into hierarchical structure... In this paper,we study the problem of extracting variable-depth"logical document hierarchy"from long documents,namely organizing the recognized"physical document objects"into hierarchical structures.The discovery of logical document hierarchy is the vital step to support many downstream applications(e.g.,passage-based retrieval and high-quality information extraction).However,long documents,containing hundreds or even thousands of pages and a variable-depth hierarchy,challenge the existing methods.To address these challenges,we develop a framework,namely Hierarchy Extraction from Long Document(HELD),where we"sequentially"insert each physical object at the proper position on the current tree.Determining whether each possible position is proper or not can be formulated as a binary classification problem.To further improve its effectiveness and efficiency,we study the design variants in HELD,including traversal orders of the insertion positions,heading extraction explicitly or implicitly,tolerance to insertion errors in predecessor steps,and so on.As for evaluations,we find that previous studies ignore the error that the depth of a node is correct while its path to the root is wrong.Since such mistakes may worsen the downstream applications seriously,a new measure is developed for a more careful evaluation.The empirical experiments based on thousands of long documents from Chinese financial market,English financial market and English scientific publication show that the HELD model with the"root-to-leaf"traversal order and explicit heading extraction is the best choice to achieve the tradeoff between effectiveness and efficiency with the accuracy of 0.972,6,0.729,1 and 0.957,8 in the Chinese financial,English financial and arXiv datasets,respectively.Finally,we show that the logical document hierarchy can be employed to significantly improve the performance of the downstream passage retrieval task.In summary,we conduct a systematic study on this task in terms of methods,evaluations,and applications. 展开更多
关键词 logical document hierarchy long documents passage retrieval
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部