期刊文献+

一种新的句子相似度度量及其在文本自动摘要中的应用 被引量:34

A New Method for Calculating Similarity Between Sentences and Application on Automatic Text Summarization
下载PDF
导出
摘要 本文提出了一种新的句子相似度度量的方法并应用于文本自动摘要中。其创新处在于相似度计算不仅考虑句子中的uni gram ,还考虑了bi gram和tri gram ,通过回归方法将这几种相似度结果综合起来。实验证明这种相似度计算方法是有效的。同时本文还提出了一种新的 ,利用句子间相似度以及句子的权重的抽句式文摘算法 ,在抽取出句子的同时也去掉了冗余。DUC2 0 0 3、DUC2 0 0 4 (DocumentUnderstandingConference 2 0 0 3,2 0 0 4 )的评测结果征明了方法的有效性。我们的系统在DUC2 0 0 4的评测中列第二位。 This paper introduces a new method for calculating similarity between sentences.The algorithm uses not only uni gram but also bi gram and tri gram to calculate similarity.The algorithm is based on regression methods. Experimentations show that the method effective.The final summarization result is better than the algorithm that does not use it.We also propose a new summarization algorithm based on sentences weight and the new sentence similarity calculating method.While extracting the most important sentences,redundancy is also reduced.The evaluation of DUC2003 and DUC2004 shows its effectiveness.Our system rank second among all systems that join in the DUC 2004.
出处 《中文信息学报》 CSCD 北大核心 2005年第2期93-99,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目 (6 0 10 30 14 ) 上海市科委重要研究项目资助 (0 35 0 0 5 0 2 8)
关键词 计算机应用 中文信息处理 文本自动摘要 向量模型 相似度计算 computer application Chinese information processing text summarization vector model similarity calculating
  • 相关文献

参考文献14

  • 1H.P.Luhn. The automatic creation of literature abstracts [ A].IBM Journal of Research Development [ C],2:159-165,1958.
  • 2G. Salton, A. Singhai, M. Mitra, C. Buckly, 1999. Automatic text structuring and summarization [ A ]. In advances in Automatic Text Summarization [ C ], Eds. I. Mani and M. T. Maybury. The MIT Press. Pp 62 - 70.
  • 3Jae-Hoon Kim,JoonHong Kim, Dosam Hwang,2000. Korean Text Summarization Using an Aggregate Similarity [ A]. The 5th International Workshop on Information Retrieval with Asian Languages [C].Hong Kong,September 30 to October 3,2000.
  • 4MINIPAR [ R]. MINIPAR's Home Page. http:∥www. cs. ualberta. ca/~ lindek/minipar. htm.
  • 5D.K.Lin, 1993.Principle-based parsing without overgeneration [A] .In Proceedings of ACL- 93 [C] ,pages 112 -120, Columbus, Ohio.
  • 6J. Carbonell, J. Goldstein, 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries [ A],In: Proceedings of the 21st ACM-SIGIR International Conference on Research and Development in Information Retrieval [C], Melbourne, Australia.
  • 7Lin, Chin-Yew and E. H. Hovy 2003. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics [ A ]. In Proceedings of 2003 Language Technology Conference (HLT-NAACL 2003) [C],Edmonton,Canada,May 27- June 1,2003.
  • 8Lin, Chin-Yew and E. H. Hovy. 2002. Automated Multi-document Summarization in NeATS [ A ]. In Proceedings of the Human Language Technology Conference (HLT2002) [C] ,San Diego,CA,U.S.A. ,March 23-27,2002.
  • 9Radev,D.R. ,Jing,H. ,and Budzikowska,M.2000. Centroid-based summarization of multiple documents [A] .In ANLP-NAACL workshop on summarization [ C].
  • 10Hovy, E. and Lin, C. 1997. Automated text summarization in SUMMARIST [ A]. Pages 18- 24. In A CL '97 workshop on Intelligent Scalable Text Summarization [ C].

同被引文献281

引证文献34

二级引证文献324

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部