期刊文献+

基于结构化学习的语句压缩研究

Sentence Compression Based on Structured Learning
下载PDF
导出
摘要 近年来随着各类信息的日益增多,语句压缩作为自动摘要的重要部分也越来越引起研究者的关注。然而当前针对语句压缩的研究才刚刚展开,存在压缩效果不佳、没有统一的自动评测指标等问题。该文在简单的删除单词的方法框架下,采用基于特征权重的最大边缘训练的结构化学习方法实现语句压缩。同时该文还提出了两种新的自动评价指标(N-Gram和BLEU)来评价语句压缩的性能。实验结果表明,采用结构化学习方法能够在保持较好压缩率的情况下保留源语句的主要信息,并且新提出的两个评价指标能够有效反映语句压缩性能。 With the rapid growth of information in recent years,sentence compression as a subtask of summarization attracts more attention.However,the research on sentence compression is in its initial stage: the performance is still beyond satisfaction and it suffers from unavailability of uniformed evaluation metrics.This paper falls in the framework of simply shortening a sentence by deleting words or constituents,and adopts structured learning approach coupled with the large margin training process.Further more,it proposes two new automatic evaluation metrics(N-Gram and BLEU) for sentence compression.Experimental results show that using of structured learning have maintained a good compression ratio while reserving the main information of source sentence.It also shows that the proposed two evaluation metrics effectively reflect the quality of sentence compression.
出处 《中文信息学报》 CSCD 北大核心 2013年第2期10-16,64,共8页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60970056) 江苏省高校自然科学基金资助项目(10KJB520016)
关键词 语句压缩 结构化学习 自动评测 sentence compression structured learning automatic evaluation
  • 相关文献

参考文献9

  • 1Corston-Oliver, Simon. Text Compaction for Display on Very Small Screens [C]//Proceedings of the NAACL Workshop on Automatic Summarization. Pittsburgh, PA, 2001: 89- 98.
  • 2Vandeghinste V, Pan Y. Sentence compression for automated subtitling: a hybrid approach [C]//Marie- Francine Moens, S. S. (Ed.). Text Summarization Branches Out: Proceedings of the ACL 04 Workshop, Barcelona, Spain, 2004:89-95.
  • 3Grefenstette G. Producing Intelligent Telegraphic Text Reduction to Provide an Audio Scanning Service for the Blind[C]//Hovy, E. , & Radev, D. R. (Eds.), Proceedings of the AAAI Symposium on Intelligent Text Summarization, Stan[ord, CA, USA, 1998: 111-117.
  • 4Knight K, Marcu D. Summarization beyond sentence extraction: a probabilistic approach to sentence compression[J]. Artificial Intelligence, 2002, 139(1) : 91- 107.
  • 5Riezler S, King T H, Crouch R, et al. Statistical sen- tence condensation using ambiguity packing and stochastic disambiguation methods for lexicabfunctional grammar[C]//Human Language Technology Conference and the 3rd Meeting of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, 2003: 118-125.
  • 6McDonald R. Discriminative sentence compression with soft syntactic constraints[C]//Proceedings of the llth Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, 2006 :297-309.
  • 7Tsochantaridis I, Joachims T, Hofmann T, et al. Large margin methods for structured and interdepend ent output variables[J]. Journal of Machine Learning Research, 2005, 6:1453-1484.
  • 8T Cohn, M Lapata. Sentence Compression as Tree Transduction[J]. Journal of Artificial Intelligence Research, 2009, 34: 637-674.
  • 9江敏,肖诗斌,王弘蔚,施水才.一种改进的基于《知网》的词语语义相似度计算[J].中文信息学报,2008,22(5):84-89. 被引量:109

二级参考文献9

  • 1夏天,樊孝忠,刘林,骆正华.基于ALICE的汉语自然语言接口[J].北京理工大学学报,2004,24(10):885-889. 被引量:11
  • 2吴健,吴朝晖,李莹,邓水光.基于本体论和词汇语义相似度的Web服务发现[J].计算机学报,2005,28(4):595-602. 被引量:218
  • 3朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:327
  • 4李峰,李芳.中文词语语义相似度计算——基于《知网》2000[J].中文信息学报,2007,21(3):99-105. 被引量:106
  • 5刘群,李素建.基于《知网》的词汇语义相似度的计算[C].台北:第三届汉语词汇语义学研讨会,2002.
  • 6董振东,董强.知网[DB/OL],http://www.keenage.com.
  • 7Dekang Lin. An Information-Theoretic Definition of Similarity Semantic distance in WordNet [C]//Proceedings of the Fifteenth International Conference on Machine Learning. 1998.
  • 8Eneko Agirre, German Rigau. A Proposal for Word Sense Disambiguation using Conceptual Distance[C]// Proceedings of the First International Conference on Recent Advanced in NLP. 1995.
  • 9BUDANITSKY, A. AND HIRST, G. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures [C]//Workshop on WordNet and Other Lexical Resources, Second meeting of the North American Chapter of the Association for Computational Linguistics. 2001.

共引文献108

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部