A large variety of complaint reports reflect subjective information expressed by citizens.A key challenge of text summarization for complaint reports is to ensure the factual consistency of generated summary.Therefore...A large variety of complaint reports reflect subjective information expressed by citizens.A key challenge of text summarization for complaint reports is to ensure the factual consistency of generated summary.Therefore,in this paper,a simple and weakly supervised framework considering factual consistency is proposed to generate a summary of city-based complaint reports without pre-labeled sentences/words.Furthermore,it considers the importance of entity in complaint reports to ensure factual consistency of summary.Experimental results on the customer review datasets(Yelp and Amazon)and complaint report dataset(complaint reports of Shenyang in China)show that the proposed framework outperforms state-of-the-art approaches in ROUGE scores and human evaluation.It unveils the effectiveness of our approach to helping in dealing with complaint reports.展开更多
In today’s digital era,the text may be in form of images.This research aims to deal with the problem by recognizing such text and utilizing the support vector machine(SVM).A lot of work has been done on the English l...In today’s digital era,the text may be in form of images.This research aims to deal with the problem by recognizing such text and utilizing the support vector machine(SVM).A lot of work has been done on the English language for handwritten character recognition but very less work on the under-resourced Hindi language.A method is developed for identifying Hindi language characters that use morphology,edge detection,histograms of oriented gradients(HOG),and SVM classes for summary creation.SVM rank employs the summary to extract essential phrases based on paragraph position,phrase position,numerical data,inverted comma,sentence length,and keywords features.The primary goal of the SVM optimization function is to reduce the number of features by eliminating unnecessary and redundant features.The second goal is to maintain or improve the classification system’s performance.The experiment included news articles from various genres,such as Bollywood,politics,and sports.The proposed method’s accuracy for Hindi character recognition is 96.97%,which is good compared with baseline approaches,and system-generated summaries are compared to human summaries.The evaluated results show a precision of 72%at a compression ratio of 50%and a precision of 60%at a compression ratio of 25%,in comparison to state-of-the-art methods,this is a decent result.展开更多
Automatic text summarization(ATS)has achieved impressive performance thanks to recent advances in deep learning(DL)and the availability of large-scale corpora.The key points in ATS are to estimate the salience of info...Automatic text summarization(ATS)has achieved impressive performance thanks to recent advances in deep learning(DL)and the availability of large-scale corpora.The key points in ATS are to estimate the salience of information and to generate coherent results.Recently,a variety of DL-based approaches have been developed for better considering these two aspects.However,there is still a lack of comprehensive literature review for DL-based ATS approaches.The aim of this paper is to comprehensively review significant DL-based approaches that have been proposed in the literature with respect to the notion of generic ATS tasks and provide a walk-through of their evolution.We first give an overview of ATS and DL.The comparisons of the datasets are also given,which are commonly used for model training,validation,and evaluation.Then we summarize single-document summarization approaches.After that,an overview of multi-document summarization approaches is given.We further analyze the performance of the popular ATS models on common datasets.Various popular approaches can be employed for different ATS tasks.Finally,we propose potential research directions in this fast-growing field.We hope this exploration can provide new insights into future research of DL-based ATS.展开更多
基金supported by National Natural Science Foundation of China(62276058,61902057,41774063)Fundamental Research Funds for the Central Universities(N2217003)Joint Fund of Science&Technology Department of Liaoning Province and State Key Laboratory of Robotics,China(2020-KF-12-11).
文摘A large variety of complaint reports reflect subjective information expressed by citizens.A key challenge of text summarization for complaint reports is to ensure the factual consistency of generated summary.Therefore,in this paper,a simple and weakly supervised framework considering factual consistency is proposed to generate a summary of city-based complaint reports without pre-labeled sentences/words.Furthermore,it considers the importance of entity in complaint reports to ensure factual consistency of summary.Experimental results on the customer review datasets(Yelp and Amazon)and complaint report dataset(complaint reports of Shenyang in China)show that the proposed framework outperforms state-of-the-art approaches in ROUGE scores and human evaluation.It unveils the effectiveness of our approach to helping in dealing with complaint reports.
文摘In today’s digital era,the text may be in form of images.This research aims to deal with the problem by recognizing such text and utilizing the support vector machine(SVM).A lot of work has been done on the English language for handwritten character recognition but very less work on the under-resourced Hindi language.A method is developed for identifying Hindi language characters that use morphology,edge detection,histograms of oriented gradients(HOG),and SVM classes for summary creation.SVM rank employs the summary to extract essential phrases based on paragraph position,phrase position,numerical data,inverted comma,sentence length,and keywords features.The primary goal of the SVM optimization function is to reduce the number of features by eliminating unnecessary and redundant features.The second goal is to maintain or improve the classification system’s performance.The experiment included news articles from various genres,such as Bollywood,politics,and sports.The proposed method’s accuracy for Hindi character recognition is 96.97%,which is good compared with baseline approaches,and system-generated summaries are compared to human summaries.The evaluated results show a precision of 72%at a compression ratio of 50%and a precision of 60%at a compression ratio of 25%,in comparison to state-of-the-art methods,this is a decent result.
基金supported by the National Key Research and Development Program of China under Grant No.2016YFB1000902the National Natural Science Foundation of China under Grant Nos.61232015,61472412,and 61621003.
文摘Automatic text summarization(ATS)has achieved impressive performance thanks to recent advances in deep learning(DL)and the availability of large-scale corpora.The key points in ATS are to estimate the salience of information and to generate coherent results.Recently,a variety of DL-based approaches have been developed for better considering these two aspects.However,there is still a lack of comprehensive literature review for DL-based ATS approaches.The aim of this paper is to comprehensively review significant DL-based approaches that have been proposed in the literature with respect to the notion of generic ATS tasks and provide a walk-through of their evolution.We first give an overview of ATS and DL.The comparisons of the datasets are also given,which are commonly used for model training,validation,and evaluation.Then we summarize single-document summarization approaches.After that,an overview of multi-document summarization approaches is given.We further analyze the performance of the popular ATS models on common datasets.Various popular approaches can be employed for different ATS tasks.Finally,we propose potential research directions in this fast-growing field.We hope this exploration can provide new insights into future research of DL-based ATS.