期刊文献+

基于主题模型与信息熵的中文文档自动摘要技术研究 被引量:7

Automatic Text Summarization Research Based on Topic Model and Information Entropy
下载PDF
导出
摘要 提出了一种基于LDA模型以及信息熵的文档自动摘要技术,即通过LDA模型对文档进行浅层语义分析,得到文档的主题分布以及不同主题下的词语分布;通过对主题的分析,可以得到最能代表文档中心思想的主题,以及该主题下的词语分布。同时,提出了一种新的基于信息熵的度量句子重要性的方法,并将该方法应用于文档的关键句抽取过程中。该方法将文档中句子的出现看成一个随机变量,通过对随机变量建模并度量它的信息熵来选取文档中的关键性语句。实验结果表明,应用主题模型与信息熵摘取的文档摘要能有效地从文档中摘出中心句。 This paper presented a method for automatic summarization based on LDA model and information entropy for Chinese document.It uses LDA model to do shallow semantic analysis work on documents and gets the distribution of topics under each document.Through analyzing the topics of document,we got the topic which has the best expression of central idea for document.Meanwhile,this paper proposed a new method to compute the sentence weight and extract the most important sentence based on measuring the information entropy for each sentence.It treats the sentence as a random variable and calculates the information entropy for every random variable.Experimental results show that this method can pick out the most important sentence in the document.
出处 《计算机科学》 CSCD 北大核心 2014年第B11期298-300,332,共4页 Computer Science
关键词 摘要 LDA模型 主题 信息熵 Summarization LDA Topic Information entropy
  • 相关文献

参考文献17

  • 1Luhn, Hans P. The automatic creation of literature abstracts [J]. IBM Journal of research and development, 1958,2(2) ..159 165.
  • 2Edmundson, Harold P, Wyllys R E. Automatic abstracting and indexing--survey and recommendations[J]. Communications ofthe ACM, 1961,4(5) : 226-234.
  • 3Edmundson, Harold P. New methods in automatic extracting [J] Journal of the ACM(JACM), 1969,16(2) : 264-285.
  • 4Pollock,Joseph J, Zamora A. Automatic abstracting research at chemical abstracts service[J]. Journal of Chemical Information and Computer Sciences, 1975,15(4) : 226-232.
  • 5Paice, Chris D. The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases [C]//Proceedings of the 3rd Annual ACM Conference on Re- search and Development in Information Retrieval. Butterworth Co. , 1980.
  • 6Salton, Gerard, et al. Automatic text structuring and summariza- tion[J]. Information Processing Management, 1997, 33 (2) : 193-207.
  • 7Blair-Goldensohn,Sasha,et al. Columbia university at duc 2004 [C]//Proceedings of the Document Understanding Conference, DUC2004. Boston, USA, 2004.
  • 8王继成,武港山,周源远,张福炎.一种篇章结构指导的中文Web文档自动摘要方法[J].计算机研究与发展,2003,40(3):398-405. 被引量:43
  • 9张奇,黄萱菁,吴立德.一种新的句子相似度度量及其在文本自动摘要中的应用[J].中文信息学报,2005,19(2):93-99. 被引量:34
  • 10尹存燕,戴新宇,陈家骏.Internet上文本的自动摘要技术[J].计算机工程,2006,32(3):88-90. 被引量:13

二级参考文献80

  • 1秦兵 LiuTing LiSheng.Summarization based on physical features and logical structure of multi documents[J].High Technology Letters,2005,11(2):133-136. 被引量:2
  • 2黄纯敏 吴郁莹.网络中文文件自动摘要[Z].http://www.mis.yuntech.edu.tw/~huangcm/ ublication/TANet073.pdf,.
  • 3Neto J L,Freitas A A.Kaestner C A A.Automatic Text Summarization Using a Machine Learing Approach[Z].http://www.cs.kent.ac.uk /people /staff/aaf/pub_papers.dir/SBIA-2002-Joel.pdf.
  • 4Radev D,Micheal A W.Topper Multi Document Centroid-based Text Summarization[C].Proceeding of the ACL-02 Demonstrations Session,Philadelphia,20002-07:112-113.
  • 5Luhn H P. The Automatic Creation of Literature Abstracts[J]. IBM Journal of Research and Development, 1958 : 159-165.
  • 6Edmundson W. Automatic Abstracting and Indexing:Survey and Recommendations[J]. Communication of the ACM, 1961,4 (5): 226-234.
  • 7Edmundson W. New methods in automatic abstracting [J].Journal of the Association for Computing Machinery, 1996,16(2): 264-285.
  • 8Pollock J J, Zamora A. Automatic Abstracting Research at Chemical Abstracts Service[J]. Journal of Chemical Information and Computer Sciences, 1975,15(4) : 226-232.
  • 9Paice C D. The Automatic Generation of Literature Abstracts: An Approach Based on the Identification of Self-indicating Phrases[J]. Information Retrieval Research.
  • 10Schank C, Abelson P. Scripts, Plans, Goals, and Understanding: An Inquiry into Human Knowledge Structures[M]. Hillsdale, New Jersey: Lawrence Erlbaum Associates, 1977.

共引文献123

同被引文献62

引证文献7

二级引证文献79

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部