期刊文献+

Hedge Trimmer句子压缩技术的算法实现及改进 被引量:1

Algorithm realization and improvement of Hedge Trimmer sentence compression technology
下载PDF
导出
摘要 压缩技术旨在模拟人类的文本概括和信息提取能力。句子压缩技术是自动生成能够保留原句核心内容的,合乎语法的,语义连贯的简短句子。文章分析了英文句子压缩技术中基于句法分析的Hedge Trimmer压缩技术,讨论了相关压缩理论,探索其压缩过程并用类C语言进行算法实现。提出了好的压缩句应该至少满足以下3个标准:第一是保留原句的核心内容,第二是具有正确的语法,第三是压缩长度合理。在算法的评估工作中,从DUC 2003语料库中选取了624个原始句子和对应的人工压缩句,与Hedge Trimmer压缩算法自动生成的压缩句进行对照分析。发现5种压缩效果不理想的情况,分析其原因并提出了改进策略。最后,通过实例对改进算法生成的压缩句和原来算法生成的压缩句进行对比评估,证明了改良算法能够获得更理想的压缩句。在英文句子压缩领域,改良的Hedge Trimmer句子压缩算法值得推广和应用。 Compression technology aims to simulate document summarization and information retrieval abilities of human. Sentence compression technology generates automatically short sentences which Capture the salient information of original sentences in a grammatically and semantically coherent way. The paper analyzes the Hedge Trimmer compression technology which is a kind of syntax-based technology of English sentence compression, discusses the compression theory and explores the compression process with the algorithm implementation in C-like language. The paper proposes that good compression should as least meets the following three standards: Firstly, it retains the main idea of the original sentence; secondly, it is grammatical; and thirdly, it is reasonable in length. In the evaluation work, we choose 624 original sentences and manual compression ones in the DUC 2003 corpus. Then we evaluate the automatic compression sentences produced by the Hedge Trimmer algorithm through comparison with original and manual ones. We find five situations, in which automatic compression sentences are not ideal. We analyze the causes and propose the improving strategies. At last, comparing the new automatic compression sentences with the old ones, we refine the algorithm to produce better compression sentences. The improved Hedge Trimmer sentence compression algorithm is ideal and could be popularized and applied in English sentence compression area.
作者 景秀丽
出处 《沈阳师范大学学报(自然科学版)》 CAS 2012年第4期519-524,共6页 Journal of Shenyang Normal University:Natural Science Edition
基金 国家自然科学基金资助项目(71002015) 辽宁省教育厅高等学校科学研究项目(2009B066) 辽宁省高等教育学会"十二五"高等教育科研课题(GHYB110231)
关键词 句子压缩 HEDGE Trimmer算法 评估 改进 sentence compression Hedge Trimmer algorithm evaluation improvement
  • 相关文献

参考文献15

  • 1陈劲光,何婷婷,李芳,等.基于概率和句法分析的中文句子修剪[C]//第五届全国青年计算语言学研讨会论文集,2010.
  • 2VANDEGHINSTE V, PAN Yi. Sentence compression for automated subtitling. A hybrid approach [C]// Proceedings of the ACL Workshop on Text Summarization, 2004:89 - 95.
  • 3CORSTON-OLIVER S. Text Compaction for Display on very small Screens [C] // Proceedings of the NAACLWorkshop on Automatic Summarization, 2001 : 89 - 98.
  • 4FILIPPOVA K, STRUBE M. Dependency tree based sentence compression [C] // Proceedings of the Fifth International Natural Language Generation Conference, 2008.25 - 32.
  • 5CLARKE J, LAPATA M. Global inference for sentence compression an integer linear programming approach[J]. Journal of Artificial Intelligence Research, 2008,31 (1) : 399 - 429.
  • 6KNIGHT K, MARCU D. Statistics -- based summarization-- step one.. Sentence compression[C] //The 17th National Conference on Artificial Intelligence (AAAI - 2000), 2000:703- 710.
  • 7KNIGHT K, MARCU D. Summarization beyond sentence extraction: a probabilistic approach to sentence compression[J]. Artificial Intelligence, 2002,139 (1) :91 - 107.
  • 8DORR B, ZAJIC D, SCHWARTZ R. Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation[C] // Proceedings of the HLT-NAACL 2003 Workshop, 2003 : 1 - 8.
  • 9景秀丽,郑学伟.基于Noisy-Channel Model的句子压缩方法[J].电大理工,2005(2):39-41. 被引量:2
  • 10JING Hongyan. A sentence reduction for automatic text summarization[C]//Proceedings of ANLP 2000, 2000.

二级参考文献15

  • 1Jing H. Sentence reduction for automatic text summarization [C]// Proc of the Sixth Conference on Applied Natural Language Processing. Seattle, WA, USA, 2000:310-315.
  • 2Knight K, Marcu D. Summarization beyond sentence extraction: A probabilistic approach to sentence compression [J]. Artificial Intelligence, 2002, 139(1) : 91 - 107.
  • 3Filippova K, Strube M. Dependency tree based sentence compression [C]// Proceedings of the Fifth International Natural Language Generation Conference. Salt Fork, OH, USA, 2008: 25-32.
  • 4Clarke J, Lapata M. Global inference for sentence compression an integer linear programming approach [J]. Journal of Artificial Intelligence Research, 2008, 31(1): 399- 429.
  • 5McDonald R. Discriminative sentence compression with soft syntactic constraints [C]// Proc of the 11th Conference of the European Chapter of the Association for Computational Linguistics. Trento, Italy, 2006: 297- 304.
  • 6Nomoto T. A comparison of model free versus model intensive approaches to sentence [C]// Proc of EMNLP 09. Singapore, 2009 : 391 - 399.
  • 7Galanis D, Androutsopoulos I. An extractive supervised two-stage method for sentence compcession [C]// Proc of NAACL. Los Angeles, CA, USA, 2010:885-893.
  • 8Richardson M, Domingos P. Markov logic networks [J]. Machine Learning, 2006, 42(1-2) : 107 - 136.
  • 9Meza-Ruiz I, Riedel S. Jointly identifying predicates, arguments and senses using Markov logic [C]// Proc of NAACL. Boulder, CO, USA, 2009, 155-163.
  • 10Riezler S, King T, Crouch R, et al. Statistical sentence condensation using ambiguity packing and stochastic disam biguation methods for lexical-functional grammar [C]// Proc of HLT-NAACL'03. Edmonton, Canada, 2003: 118-125.

共引文献1

同被引文献3

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部