期刊文献+

基于主题区域发现的中文自动文摘研究 被引量:5

A Study of Chinese Text Summarization Based on Thematic Area Discovery
下载PDF
导出
摘要 自动文摘是自然语言处理领域的一项重要的研究课题。文中提出了一种基于主题区域发现的中文自动文摘的方法。该方法的特色在于:产生的文摘能在尽可能全面地覆盖全文多个主题的同时,显著地缩减自身的冗余,从而能有效地平衡两者之间的矛盾。通过采用K-medoids的聚类算法联同新的自定义目标函数的聚类分析方法,实现了段落自适应聚类下的文本潜在主题区域的发现及其在自动文摘领域的应用。此外,一种基于表达熵的新的评价因子被用来评价摘要的冗余。实验结果验证了该方法的可行性,有效性,是对中文自动文摘研究的一种有意义的探索。 Automatic summarization is an important issue in Natural Language Processing. This paper has proposed a special method that creates text summary by discovering thematic areas from Chinese text. The specificity of the method is that the created summary can both cover as many as different themes and reduce its redundancy obviously at the same time. And the discovery of latent thematic areas under the adaptive clustering of passages is realized by adopting k-medoids clustering method as well as a novel clustering analysis method based on self-defined objective function. In addition, a novel parameter,which is known as representation entropy,is used for summarization redun- dancy evaluation. Experimental results indicate that this method is effective and efficient in the automatic summariza- tion literature.
出处 《计算机科学》 CSCD 北大核心 2005年第1期177-181,共5页 Computer Science
基金 中国国家语言文字应用委员会"十五"国家语委应用项目基金(ZDI105-43B) 湖北省自然科学基金(2001ABB012)
关键词 主题区域发现 中文自动文摘 聚类分析 表达熵 文本检索 Automatic summarization Thematic area discovery Clustering analysis Representation entropy
  • 相关文献

参考文献11

  • 1杨晓兰,钟义信.基于文本理解的自动文摘系统研究与实现[J].电子学报,1998,26(7):155-158. 被引量:17
  • 2王继成,武港山,周源远,张福炎.一种篇章结构指导的中文Web文档自动摘要方法[J].计算机研究与发展,2003,40(3):398-405. 被引量:43
  • 3刘建舟 何婷婷 姬东鸿.基于开放式语料的汉语术语的自动抽取[A]..第二十届东方语言计算机处理国际学术会议论文集[C].,2003.43-49.
  • 4Nomoto T,Matsumoto Yuji. A New Approach to Unsupervised Text Summarization. In :Proc. of ACM SIGIR'01,2001. 26~34
  • 5Gong Yihong, Liu Xin. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In: Proc. of ACM SIGIR'01,2001.19~25
  • 6Pantel P, Lin Dekang. Document Clustering with Committees.In:Proc. of ACM SIGIR'02,2002. 199~206
  • 7Mitra P, Murthy C A,Pal S K. Unsupervised Feature Selection Using Feature Similarity. IEEE Transactions of Pattern Analysis and Machine Intelligence, 2002. 1~ 13
  • 8MANI I. Summarization Evaluation: An Overview. In: Proc. of the NTCIR Workshop 2 Meeting on Evaluation of Chinese and Japanese Text Retrieval and Text Summarization. Tokyo: National Institute of Informatics, 2001
  • 9MANI I. Recent Developments in Text Summarization. In:Proc.of CIKM'01,2001:529~531
  • 10Kaufmann L, Rousseeuw P J. Clustering by means of medoids.In Statistical Data Analysis based on the L1 Norm. In:Dodge Y,ed. Amsterdam, 1987. 405~416

二级参考文献10

共引文献58

同被引文献31

引证文献5

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部