期刊文献+

基于PV-DM模型的多文档摘要方法 被引量:2

PV-DM MODEL-BASED MULTI-DOCUMENT SUMMARISATION
下载PDF
导出
摘要 当前的基于词向量的多文档摘要方法没有考虑句子中词语的顺序,存在异句同向量问题以及在小规模训练数据上生成的摘要冗余度高的问题。针对这些问题,提出基于PV-DM(Distributed Memory Model of Paragraph Vectors)模型的多文档摘要方法。该方法首先构建单调亚模(Submodular)目标函数;然后,通过训练PV-DM模型得到句子向量计算句子间的语义相似度,进而求解单调亚模目标函数;最后,利用优化算法抽取句子生成摘要。在标准数据集Opinosis上的实验结果表明该方法优于当前主流的多文档摘要方法。 Currently,the word vector-based multi-document summarisation method does not take the order of words in sentences into consideration,it has the problem of same vector in different sentences and the problem of high redundancy in the summaries generated from small-scale training data. To solve these problems,we propose a method based on PV-DM model-based multi-document summarisation method. First,the method formulates the monotone submodular objective function. Then,by training PV-DM model it obtains sentence vectors to calculate the semantic similarity between sentences,and then calculates the monotone submodular objective function. Finally,it uses the optimised algorithm to extract sentences to form summary. Result of experiment on standard dataset Opinosis show that our method outperforms existing mainstream multi-document summarisation method.
出处 《计算机应用与软件》 CSCD 2016年第10期251-255,278,共6页 Computer Applications and Software
基金 国家社会科学基金项目(14BXW028)
关键词 语义相似度 PV-DM模型 句子向量 多文档摘要 单调亚模函数 Semantic similarity PV-DM(Distributed memory model of paragraph vectors) model Sentence vector Multi-document summary Monotone submodular function
  • 相关文献

参考文献23

  • 1Takamura H,Okumura M.Text summarization model based on maximum coverage problem and its variant[C]//Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics.Association for Computational Linguistics,2009:781-789.
  • 2Lin H,Bilmes J,Xie S.Graph-based submodular selection for extractive summarization[C]//Automatic Speech Recognition&Understanding,2009.ASRU 2009.IEEE Workshop on.IEEE,2009:381-386.
  • 3Liu F,Liu Y,Weng F.Why is SXSW trending-:exploring multiple text sources for Twitter topic summarization[C]//Proceedings of the Workshop on Languages in Social Media.Association for Computational Linguistics,2011:66-75.
  • 4Lin H,Bilmes J.A class of submodular functions for document summarization[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies-Volume 1.Association for Computational Linguistics,2011:510-520.
  • 5K-geb-ck M,Mogren O,Tahmasebi N,et al.Extractive summarization using continuous vector space models[C]//Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality(CVSC)@EACL,2014:31-39.
  • 6Mikolov T,Sutskever I,Chen K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems,2013:3111-3119.
  • 7Bengio Y,Ducharme R,Vincent P,et al.A neural probabilistic language model[J].The Journal of Machine Learning Research,2003,3(2):1137-1155.
  • 8Collobert R,Weston J.A unified architecture for natural language processing:Deep neural networks with multitask learning[C]//Proceedings of the 25th international conference on Machine learning.ACM,2008:160-167.
  • 9Huang E H,Socher R,Manning C D,et al.Improving word representations via global context and multiple word prototypes[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics:Long Papers-Volume 1.Association for Computational Linguistics,2012:873-882.
  • 10Mikolov T,Chen K,Corrado G,et al.Efficient estimation of word representations in vector space[J].ar Xiv preprint ar Xiv,2013:1301,3781.

二级参考文献20

共引文献5

同被引文献15

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部