The rapid expansion of online content and big data has precipitated an urgent need for efficient summarization techniques to swiftly comprehend vast textual documents without compromising their original integrity.Curr...The rapid expansion of online content and big data has precipitated an urgent need for efficient summarization techniques to swiftly comprehend vast textual documents without compromising their original integrity.Current approaches in Extractive Text Summarization(ETS)leverage the modeling of inter-sentence relationships,a task of paramount importance in producing coherent summaries.This study introduces an innovative model that integrates Graph Attention Networks(GATs)with Transformer-based Bidirectional Encoder Representa-tions from Transformers(BERT)and Latent Dirichlet Allocation(LDA),further enhanced by Term Frequency-Inverse Document Frequency(TF-IDF)values,to improve sentence selection by capturing comprehensive topical information.Our approach constructs a graph with nodes representing sentences,words,and topics,thereby elevating the interconnectivity and enabling a more refined understanding of text structures.This model is stretched to Multi-Document Summarization(MDS)from Single-Document Summarization,offering significant improvements over existing models such as THGS-GMM and Topic-GraphSum,as demonstrated by empirical evaluations on benchmark news datasets like Cable News Network(CNN)/Daily Mail(DM)and Multi-News.The results consistently demonstrate superior performance,showcasing the model’s robustness in handling complex summarization tasks across single and multi-document contexts.This research not only advances the integration of BERT and LDA within a GATs but also emphasizes our model’s capacity to effectively manage global information and adapt to diverse summarization challenges.展开更多
With the wider growth of web-based documents,the necessity of automatic document clustering and text summarization is increased.Here,document summarization that is extracting the essential task with appropriate inform...With the wider growth of web-based documents,the necessity of automatic document clustering and text summarization is increased.Here,document summarization that is extracting the essential task with appropriate information,removal of unnecessary data and providing the data in a cohesive and coherent manner is determined to be a most confronting task.In this research,a novel intelligent model for document clustering is designed with graph model and Fuzzy based association rule generation(gFAR).Initially,the graph model is used to map the relationship among the data(multi-source)followed by the establishment of document clustering with the generation of association rule using the fuzzy concept.This method shows benefit in redundancy elimination by mapping the relevant document using graph model and reduces the time consumption and improves the accuracy using the association rule generation with fuzzy.This framework is provided in an interpretable way for document clustering.It iteratively reduces the error rate during relationship mapping among the data(clusters)with the assistance of weighted document content.Also,this model represents the significance of data features with class discrimination.It is also helpful in measuring the significance of the features during the data clustering process.The simulation is done with MATLAB 2016b environment and evaluated with the empirical standards like Relative Risk Patterns(RRP),ROUGE score,and Discrimination Information Measure(DMI)respectively.Here,DailyMail and DUC 2004 dataset is used to extract the empirical results.The proposed gFAR model gives better trade-off while compared with various prevailing approaches.展开更多
Cross-document relation extraction(RE),as an extension of information extraction,requires integrating information from multiple documents retrieved from open domains with a large number of irrelevant or confusing nois...Cross-document relation extraction(RE),as an extension of information extraction,requires integrating information from multiple documents retrieved from open domains with a large number of irrelevant or confusing noisy texts.Previous studies focus on the attention mechanism to construct the connection between different text features through semantic similarity.However,similarity-based methods cannot distinguish valid information from highly similar retrieved documents well.How to design an effective algorithm to implement aggregated reasoning in confusing information with similar features still remains an open issue.To address this problem,we design a novel local-toglobal causal reasoning(LGCR)network for cross-document RE,which enables efficient distinguishing,filtering and global reasoning on complex information from a causal perspective.Specifically,we propose a local causal estimation algorithm to estimate the causal effect,which is the first trial to use the causal reasoning independent of feature similarity to distinguish between confusing and valid information in cross-document RE.Furthermore,based on the causal effect,we propose a causality guided global reasoning algorithm to filter the confusing information and achieve global reasoning.Experimental results under the closed and the open settings of the large-scale dataset Cod RED demonstrate our LGCR network significantly outperforms the state-ofthe-art methods and validate the effectiveness of causal reasoning in confusing information processing.展开更多
文摘The rapid expansion of online content and big data has precipitated an urgent need for efficient summarization techniques to swiftly comprehend vast textual documents without compromising their original integrity.Current approaches in Extractive Text Summarization(ETS)leverage the modeling of inter-sentence relationships,a task of paramount importance in producing coherent summaries.This study introduces an innovative model that integrates Graph Attention Networks(GATs)with Transformer-based Bidirectional Encoder Representa-tions from Transformers(BERT)and Latent Dirichlet Allocation(LDA),further enhanced by Term Frequency-Inverse Document Frequency(TF-IDF)values,to improve sentence selection by capturing comprehensive topical information.Our approach constructs a graph with nodes representing sentences,words,and topics,thereby elevating the interconnectivity and enabling a more refined understanding of text structures.This model is stretched to Multi-Document Summarization(MDS)from Single-Document Summarization,offering significant improvements over existing models such as THGS-GMM and Topic-GraphSum,as demonstrated by empirical evaluations on benchmark news datasets like Cable News Network(CNN)/Daily Mail(DM)and Multi-News.The results consistently demonstrate superior performance,showcasing the model’s robustness in handling complex summarization tasks across single and multi-document contexts.This research not only advances the integration of BERT and LDA within a GATs but also emphasizes our model’s capacity to effectively manage global information and adapt to diverse summarization challenges.
文摘With the wider growth of web-based documents,the necessity of automatic document clustering and text summarization is increased.Here,document summarization that is extracting the essential task with appropriate information,removal of unnecessary data and providing the data in a cohesive and coherent manner is determined to be a most confronting task.In this research,a novel intelligent model for document clustering is designed with graph model and Fuzzy based association rule generation(gFAR).Initially,the graph model is used to map the relationship among the data(multi-source)followed by the establishment of document clustering with the generation of association rule using the fuzzy concept.This method shows benefit in redundancy elimination by mapping the relevant document using graph model and reduces the time consumption and improves the accuracy using the association rule generation with fuzzy.This framework is provided in an interpretable way for document clustering.It iteratively reduces the error rate during relationship mapping among the data(clusters)with the assistance of weighted document content.Also,this model represents the significance of data features with class discrimination.It is also helpful in measuring the significance of the features during the data clustering process.The simulation is done with MATLAB 2016b environment and evaluated with the empirical standards like Relative Risk Patterns(RRP),ROUGE score,and Discrimination Information Measure(DMI)respectively.Here,DailyMail and DUC 2004 dataset is used to extract the empirical results.The proposed gFAR model gives better trade-off while compared with various prevailing approaches.
基金supported in part by the National Key Research and Development Program of China(2022ZD0116405)the Strategic Priority Research Program of the Chinese Academy of Sciences(XDA27030300)the Key Research Program of the Chinese Academy of Sciences(ZDBS-SSW-JSC006)。
文摘Cross-document relation extraction(RE),as an extension of information extraction,requires integrating information from multiple documents retrieved from open domains with a large number of irrelevant or confusing noisy texts.Previous studies focus on the attention mechanism to construct the connection between different text features through semantic similarity.However,similarity-based methods cannot distinguish valid information from highly similar retrieved documents well.How to design an effective algorithm to implement aggregated reasoning in confusing information with similar features still remains an open issue.To address this problem,we design a novel local-toglobal causal reasoning(LGCR)network for cross-document RE,which enables efficient distinguishing,filtering and global reasoning on complex information from a causal perspective.Specifically,we propose a local causal estimation algorithm to estimate the causal effect,which is the first trial to use the causal reasoning independent of feature similarity to distinguish between confusing and valid information in cross-document RE.Furthermore,based on the causal effect,we propose a causality guided global reasoning algorithm to filter the confusing information and achieve global reasoning.Experimental results under the closed and the open settings of the large-scale dataset Cod RED demonstrate our LGCR network significantly outperforms the state-ofthe-art methods and validate the effectiveness of causal reasoning in confusing information processing.