期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Unsupervised Graph-Based Tibetan Multi-Document Summarization
1
作者 Xiaodong Yan Yiqin Wang +3 位作者 Wei Song Xiaobing Zhao A.Run Yang Yanxing 《Computers, Materials & Continua》 SCIE EI 2022年第10期1769-1781,共13页
Text summarization creates subset that represents the most important or relevant information in the original content,which effectively reduce information redundancy.Recently neural network method has achieved good res... Text summarization creates subset that represents the most important or relevant information in the original content,which effectively reduce information redundancy.Recently neural network method has achieved good results in the task of text summarization both in Chinese and English,but the research of text summarization in low-resource languages is still in the exploratory stage,especially in Tibetan.What’s more,there is no large-scale annotated corpus for text summarization.The lack of dataset severely limits the development of low-resource text summarization.In this case,unsupervised learning approaches are more appealing in low-resource languages as they do not require labeled data.In this paper,we propose an unsupervised graph-based Tibetan multi-document summarization method,which divides a large number of Tibetan news documents into topics and extracts the summarization of each topic.Summarization obtained by using traditional graph-based methods have high redundancy and the division of documents topics are not detailed enough.In terms of topic division,we adopt two level clustering methods converting original document into document-level and sentence-level graph,next we take both linguistic and deep representation into account and integrate external corpus into graph to obtain the sentence semantic clustering.Improve the shortcomings of the traditional K-Means clustering method and perform more detailed clustering of documents.Then model sentence clusters into graphs,finally remeasure sentence nodes based on the topic semantic information and the impact of topic features on sentences,higher topic relevance summary is extracted.In order to promote the development of Tibetan text summarization,and to meet the needs of relevant researchers for high-quality Tibetan text summarization datasets,this paper manually constructs a Tibetan summarization dataset and carries out relevant experiments.The experiment results show that our method can effectively improve the quality of summarization and our method is competitive to previous unsupervised methods. 展开更多
关键词 Multi-document summarization text clustering topic feature fusion graphic model
下载PDF
Topic-Feature Lattices Construction and Visualization for Dynamic Topic Number 被引量:1
2
作者 Kai WANG Fuzhi WANG 《Journal of Systems Science and Information》 CSCD 2021年第5期558-574,共17页
The topic recognition for dynamic topic number can realize the dynamic update of super parameters,and obtain the probability distribution of dynamic topics in time dimension,which helps to clear the understanding and ... The topic recognition for dynamic topic number can realize the dynamic update of super parameters,and obtain the probability distribution of dynamic topics in time dimension,which helps to clear the understanding and tracking of convection text data.However,the current topic recognition model tends to be based on a fixed number of topics K and lacks multi-granularity analysis of subject knowledge.Therefore,it is impossible to deeply perceive the dynamic change of the topic in the time series.By introducing a novel approach on the basis of Infinite Latent Dirichlet allocation model,a topic feature lattice under the dynamic topic number is constructed.In the model,documents,topics and vocabularies are jointly modeled to generate two probability distribution matrices:Documentstopics and topic-feature words.Afterwards,the association intensity is computed between the topic and its feature vocabulary to establish the topic formal context matrix.Finally,the topic feature is induced according to the formal concept analysis(FCA)theory.The topic feature lattice under dynamic topic number(TFL DTN)model is validated on the real dataset by comparing with the mainstream methods.Experiments show that this model is more in line with actual needs,and achieves better results in semi-automatic modeling of topic visualization analysis. 展开更多
关键词 dynamic topic number infinite latent Dirichlet allocation(ILDA) formal concept analysis topic feature lattice topic feature lattice under dynamic topic number(TFL_DTN)model
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部