期刊文献+

基于LDA重要主题的多文档自动摘要算法 被引量:11

Multi-Document Summarization Algorithm Based on Significance Topic of LDA
下载PDF
导出
摘要 提出了基于LDA(latent Dirichlet allocation)重要主题的多文档自动摘要算法。该算法与已有的基于主题模型的多文档自动摘要算法主要有两点区别:第一,在计算句子主题与文档主题相似度问题上,引入并定义了主题重要性的概念,将LDA模型建立的主题分成重要和非重要主题两类,计算句子权重时重点考虑句子主题和文档重要主题的相似性;第二,该方法同时使用句子的词频、位置等统计特征和LDA特征组成的向量计算句子的权重,既突出了传统的统计特征的显著优势,又结合了LDA模型的主题概念。实验表明,该算法在DUC2002标准数据集上取得了较好的摘要效果。 This paper proposes a multi-document summarization algorithm based on significance topic of LDA (latent Dirichlet allocation) model. There are two differences between this algorithm and other algorithms based on LDA model. Firstly, this algorithm gives the definition of significant topic, divides topic into significance topic and insig- nificance topic, calculates similarity between sentence and document using significance topic. Secondly, beside topic characteristics, this algorithm also considers some statistics characteristics, such as term frequency, sentence position, sentence length, etc. This algorithm not only highlights the advantages of statistics characteristics, but also cooperates with LDA topic model. The experiments show that the proposed algorithm achieves better performance compared to the other state-of-the-art algorithms on DUC2002 corpus.
出处 《计算机科学与探索》 CSCD 北大核心 2015年第2期242-248,共7页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金 大连市科学技术基金~~
关键词 多文档摘要 主题模型 重要主题 multi-document summarization topic model significance topic
  • 相关文献

参考文献6

二级参考文献164

  • 1秦兵,刘挺,李生.基于局部主题判定与抽取的多文档文摘技术[J].自动化学报,2004,30(6):905-910. 被引量:10
  • 2Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003, 3 : 993 - 1022.
  • 3Griffiths T L, Steyvers M. A Probabilistic Approach to Semantic Representation// Proc of the 24th Annual Conference of the Cognitive Science Society. Fairfax, USA, 2002 : 381 - 386.
  • 4Griffiths T L, Steyvers M. Prediction and Semantic Association//Becket S, Thrun S, Obermayer K, eds. Advance in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2003, 15:11-18.
  • 5Griffiths T L, Steyvers M. Finding Scientific Topics. Proc of the National Academy of Science, 2004, 101 ( Z1 ) : 5228 - 5235.
  • 6Hofmann T. Probabilistic Latent Semantic Analysis// Proc of the 15th Conference on Uncertainty in Artificial Intelligence. Stockholm, Sweden, 1999 : 289 - 296.
  • 7Hofmann T. Probabilistic Latent Semantic Indexing//Proc of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley, USA, 1999:50-57.
  • 8Hofmann T. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning. 2001, 42(1/2) : 177 - 196.
  • 9Banerjee S, Pedersen T. The Design, Implementation and Use of the Ngram Statistics Package//Proc of the 4th International Conference on Intelligent Text Processing and Computational Linguistics. Mexico, Mexico, 2003 : 370 - 381.
  • 10Nigam K, McCallum A, Thrun S. et al. Text Classification from Labeled and Unlabeled Documents Using EM. Machine Learning, 2000, 39(2/3) : 103 - 134.

共引文献443

同被引文献53

引证文献11

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部