With the rapid development of the Internet, multi documents summarization is becoming a very hot research topic. In order to generate a summarization that can effectively characterize the original information from doc...With the rapid development of the Internet, multi documents summarization is becoming a very hot research topic. In order to generate a summarization that can effectively characterize the original information from documents, this paper proposes a multi documents summarization approach based on the physical features and logical structure of the document set. This method firstly clusterssimilar sentences into several Logical Topics (LTs), and then orders these topics according to their physical features of multi documents. After that, sentences used for the summarization are extracted from these LTs, and finally the summarization is generated via certain sorting algorithms. Our experiments show that the information coverage rate of our method is 8.83% higher than those methods based solely on logical structures, and 14.31% higher than Top-N method.展开更多
This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level an...This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level and micro-level discourse structure to identify important information that can be extracted from dissertation abstracts, and then uses a variable-based framework to integrate and organize extracted information across dissertation abstracts. This framework focuses more on research concepts and their research relationships found in sociology dissertation abstracts and has a hierarchical structure. A taxonomy is constructed to support the summarization process in two ways: (1) helping to identify important concepts and relations expressed in the text, and (2) providing a structure for linking similar concepts in different abstracts. This paper describes the variable-based framework and the summarization process, and then reports the construction of the taxonomy for supporting the summarization process. An example is provided to show how to use the constructed taxonomy to identify important concepts and integrate the concepts extracted from different abstracts.展开更多
文摘With the rapid development of the Internet, multi documents summarization is becoming a very hot research topic. In order to generate a summarization that can effectively characterize the original information from documents, this paper proposes a multi documents summarization approach based on the physical features and logical structure of the document set. This method firstly clusterssimilar sentences into several Logical Topics (LTs), and then orders these topics according to their physical features of multi documents. After that, sentences used for the summarization are extracted from these LTs, and finally the summarization is generated via certain sorting algorithms. Our experiments show that the information coverage rate of our method is 8.83% higher than those methods based solely on logical structures, and 14.31% higher than Top-N method.
文摘This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level and micro-level discourse structure to identify important information that can be extracted from dissertation abstracts, and then uses a variable-based framework to integrate and organize extracted information across dissertation abstracts. This framework focuses more on research concepts and their research relationships found in sociology dissertation abstracts and has a hierarchical structure. A taxonomy is constructed to support the summarization process in two ways: (1) helping to identify important concepts and relations expressed in the text, and (2) providing a structure for linking similar concepts in different abstracts. This paper describes the variable-based framework and the summarization process, and then reports the construction of the taxonomy for supporting the summarization process. An example is provided to show how to use the constructed taxonomy to identify important concepts and integrate the concepts extracted from different abstracts.