基于PV-DM模型的多文档摘要方法被引量：2

PV-DM MODEL-BASED MULTI-DOCUMENT SUMMARISATION

下载PDF

导出

摘要当前的基于词向量的多文档摘要方法没有考虑句子中词语的顺序,存在异句同向量问题以及在小规模训练数据上生成的摘要冗余度高的问题。针对这些问题,提出基于PV-DM(Distributed Memory Model of Paragraph Vectors)模型的多文档摘要方法。该方法首先构建单调亚模(Submodular)目标函数;然后,通过训练PV-DM模型得到句子向量计算句子间的语义相似度,进而求解单调亚模目标函数;最后,利用优化算法抽取句子生成摘要。在标准数据集Opinosis上的实验结果表明该方法优于当前主流的多文档摘要方法。 Currently,the word vector-based multi-document summarisation method does not take the order of words in sentences into consideration,it has the problem of same vector in different sentences and the problem of high redundancy in the summaries generated from small-scale training data. To solve these problems,we propose a method based on PV-DM model-based multi-document summarisation method. First,the method formulates the monotone submodular objective function. Then,by training PV-DM model it obtains sentence vectors to calculate the semantic similarity between sentences,and then calculates the monotone submodular objective function. Finally,it uses the optimised algorithm to extract sentences to form summary. Result of experiment on standard dataset Opinosis show that our method outperforms existing mainstream multi-document summarisation method.

作者刘欣王波毛二松

机构地区解放军信息工程大学

出处《计算机应用与软件》 CSCD 2016年第10期251-255,278,共6页 Computer Applications and Software

基金国家社会科学基金项目(14BXW028)

关键词语义相似度 PV-DM模型句子向量多文档摘要单调亚模函数 Semantic similarity PV-DM(Distributed memory model of paragraph vectors) model Sentence vector Multi-document summary Monotone submodular function

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献23

1Takamura H,Okumura M.Text summarization model based on maximum coverage problem and its variant[C]//Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics.Association for Computational Linguistics,2009:781-789.
2Lin H,Bilmes J,Xie S.Graph-based submodular selection for extractive summarization[C]//Automatic Speech Recognition&Understanding,2009.ASRU 2009.IEEE Workshop on.IEEE,2009:381-386.
3Liu F,Liu Y,Weng F.Why is SXSW trending-:exploring multiple text sources for Twitter topic summarization[C]//Proceedings of the Workshop on Languages in Social Media.Association for Computational Linguistics,2011:66-75.
4Lin H,Bilmes J.A class of submodular functions for document summarization[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies-Volume 1.Association for Computational Linguistics,2011:510-520.
5K-geb-ck M,Mogren O,Tahmasebi N,et al.Extractive summarization using continuous vector space models[C]//Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality(CVSC)@EACL,2014:31-39.
6Mikolov T,Sutskever I,Chen K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems,2013:3111-3119.
7Bengio Y,Ducharme R,Vincent P,et al.A neural probabilistic language model[J].The Journal of Machine Learning Research,2003,3(2):1137-1155.
8Collobert R,Weston J.A unified architecture for natural language processing:Deep neural networks with multitask learning[C]//Proceedings of the 25th international conference on Machine learning.ACM,2008:160-167.
9Huang E H,Socher R,Manning C D,et al.Improving word representations via global context and multiple word prototypes[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics:Long Papers-Volume 1.Association for Computational Linguistics,2012:873-882.
10Mikolov T,Chen K,Corrado G,et al.Efficient estimation of word representations in vector space[J].ar Xiv preprint ar Xiv,2013:1301,3781.

二级参考文献20

1王萌,何婷婷,姬东鸿,王晓荣.基于HowNet概念获取的中文自动文摘系统[J].中文信息学报,2005,19(3):87-93. 被引量：22
2秦兵,刘挺,李生.多文档自动文摘综述[J].中文信息学报,2005,19(6):13-20. 被引量：51
3傅间莲,陈群秀.自动文摘系统中的主题划分问题研究[J].中文信息学报,2005,19(6):28-35. 被引量：13
4耿焕同,蔡庆生,赵鹏,于琨.一种基于词共现图的文档自动摘要研究[J].情报学报,2005,24(6):651-656. 被引量：15
5傅间莲,陈群秀.基于规则和统计的中文自动文摘系统[J].中文信息学报,2006,20(5):10-16. 被引量：21
6刘德喜,何炎祥,姬东鸿,杨华.一种基于演化算法进行句子抽取的多文档自动摘要系统SBGA[J].中文信息学报,2006,20(6):46-53. 被引量：10
7马慧芳,祁云平,杨小东.一种基于文本关系图的多文档自动摘要技术[J].情报杂志,2007,26(3):67-69. 被引量：7
8Lucy Vanderwende, Michele Banko, Arul Menezes. Event-centric summary generation[C]//Proceedings of Document Understanding Conference, Boston, USA, 2004.Available at: duc. hist. gov/pubs/2004papers/microsoft, banko, pdf.
9Jure Leskovec, Natasa Milic-Frayling, Marko Grobe-lnik. Extracting Summary Sentences Based on the Document Semantic Graph. MSR-TR-2005-07. Available at: ftp://ftp, research, microsoft, com/ pub/tr/TR-2005-07, pdf.
10Rada Mihalcea. Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization [C]//Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, 2004: 170-173.

共引文献5

1韩永峰,许旭阳,李弼程,朱武斌,陈刚.基于事件抽取的网络新闻多文档自动摘要[J].中文信息学报,2012,26(1):58-66. 被引量：15
2王红玲,周国栋,朱巧明.面向冗余度控制的中文多文档自动文摘[J].中文信息学报,2012,26(2):92-96. 被引量：6
3卢冶,苏勇,须磊.基于手机终端的中文文本网页自动综述系统的研究[J].计算机与数字工程,2013,41(6):943-946.
4孙佩佩,廖涛,刘宗田.基于事件要素的自动文摘抽取[J].计算机与数字工程,2015,43(10):1829-1833. 被引量：2
5任立园,谢振平,刘渊.文本摘要的建构渗透度特征模型[J].中文信息学报,2018,32(7):74-81.

同被引文献15

1汤艳君.关键字搜索方法在电子数据取证中的应用[J].中国刑警学院学报,2008(2):25-27. 被引量：2
2蒙祖强,黄柏雄.一种新的网络热点话题提取方法[J].小型微型计算机系统,2013,34(4):743-748. 被引量：6
3杨宇婷,王名扬,田宪允,李鹏宇.基于文档分布式表达的新浪微博情感分类研究[J].情报杂志,2016,35(2):151-156. 被引量：16
4刘江华.一种基于kmeans聚类算法和LDA主题模型的文本检索方法及有效性验证[J].情报科学,2017,35(2):16-21. 被引量：40
5薛卫,杨荣丽,赵南,徐焕良,任守纲.空间密度相似性度量K-means算法[J].小型微型计算机系统,2018,39(1):53-57. 被引量：13
6贾晓婷,王名扬,曹宇.结合Doc2Vec与改进聚类算法的中文单文档自动摘要方法研究[J].数据分析与知识发现,2018,2(2):86-95. 被引量：18
7冯靖,莫秀良,王春东.基于LDA改进的K-means算法在短文本聚类中的研究[J].天津理工大学学报,2018,34(3):7-11. 被引量：6
8李心蕾,王昊,刘小敏,邓三鸿.面向微博短文本分类的文本向量化方法比较研究[J].数据分析与知识发现,2018,2(8):41-50. 被引量：13
9戴月明,王明慧,张明,王艳.SVD优化初始簇中心的K-means中文文本聚类算法[J].系统仿真学报,2018,30(10):3835-3842. 被引量：9
10冀宇轩.文本向量化表示方法的总结与分析[J].电子世界,2018,0(22):10-12. 被引量：9

引证文献2

1许彩滇,刘晓丽.基于改进K-means算法的网络入侵行为取证研究[J].中国人民公安大学学报（自然科学版）,2020,26(2):68-74.
2汤艳君,苏梅,许彩滇,屈丽.利用Doc2Vec及改进K-means聚类实现文本取证分析[J].中国刑警学院学报,2020(4):115-121. 被引量：3

二级引证文献3

1陈宇峰.采用CNN-LSTM与迁移学习的虚假评论检测[J].软件导刊,2022,21(2):63-67. 被引量：1
2梁家富,李家华.基于Doc2Vec和随机森林的外卖评价预测方法[J].微型电脑应用,2022,38(6):142-144. 被引量：1
3王晓琦,朱玉虎,冯莉,王文升.情报学视角下航空工业颠覆性技术识别研究[J].江苏科技信息,2023,40(31):35-40.

1潘峰,李军,杨晓元,彭见阳.基于图像复杂度的隐写方法研究[J].计算机应用研究,2011,28(7):2712-2714. 被引量：5
2李军,潘峰,李秀广.基于像素差和模函数的隐写方法[J].计算机工程,2011,37(14):125-127. 被引量：1
3廖琪男.基于边缘匹配和模函数的安全密写算法[J].电子学报,2012,40(10):2002-2008. 被引量：4
4陈维政,严睿,闫宏飞,李晓明.利用维基百科实体增强基于图的多文档摘要[J].中文信息学报,2016,30(2):153-159. 被引量：2
5廖琪男.利用模运算及其周期性特点的安全隐写算法[J].中国图象图形学报,2012,17(10):1206-1212. 被引量：3
6李艺红,蒋秀凤.中文句子倾向性分析[J].福州大学学报（自然科学版）,2010,38(4):504-508. 被引量：5
7罗毅辉,熊曙初.一种集成框架下的分布式多文档自动摘要方法[J].情报杂志,2013,32(11):133-136. 被引量：3
8廖琪男,柯琦,赖振丹.优化参数化二元模映射信息隐藏[J].中国图象图形学报,2014,19(10):1407-1417.
9宋俊,韩啸宇,黄宇,黄廷磊,付琨.一种面向实体的演化式多文档摘要生成方法[J].广西师范大学学报（自然科学版）,2015,33(2):36-41. 被引量：2
10黄巧明.用模函数分配端口的GSM短消息堵塞解决方法[J].湖南工业大学学报,2007,21(6):94-97.

计算机应用与软件

2016年第10期

浏览历史

内容加载中请稍等...

基于PV-DM模型的多文档摘要方法被引量：2

参考文献23

二级参考文献20

共引文献5

同被引文献15

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于PV-DM模型的多文档摘要方法 被引量：2

参考文献23

二级参考文献20

共引文献5

同被引文献15

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于PV-DM模型的多文档摘要方法被引量：2