摘要
为了解决传统多文档抽取式摘要方法无法有效利用文档之间的语义信息、摘要结果存在过多冗余内容的问题,提出了一种基于分层最大边缘相关的柬语多文档抽取式摘要方法。首先,将柬语多文档文本输入到训练好的深度学习模型中,抽取得到所有的单文档摘要;然后,依据类似分层瀑布的方式,迭代合并所有的单文档摘要,通过改进的最大边缘相关算法合理地选择摘要句,得到最终的多文档摘要。结果表明,与其他方法相比,通过使用深度学习方法并结合分层最大边缘相关算法共同获得的柬语多文档摘要,R1,R2,R3和RL值分别提高了4.31%,5.33%,6.45%和4.26%。基于分层最大边缘相关的柬语多文档抽取式摘要方法在保证摘要句子多样性和差异性的同时,有效提高了柬语多文档摘要的质量。
In order to solve the problem of ineffective utilization of the semantic information between documents in the traditional multi-document extractive summarization method and the excessive redundant content in the summary result,a Khmer multi-document extractive summarization method based on hierarchical maximal marginal relevance(MMR)was proposed.Firstly,the Khmer multi-document text was input into the trained deep learning model to extract all the single-document summaries.Then,all single document summaries were iteratively merged according to a similar hierarchical waterfall method,and the improved MMR algorithm was used to reasonably select summary sentences to obtain the final multi-document summary.The experimental results show that the R1,R2,R3,RL values of the Khmer multi-document summary obtained by using the deep learning method combined with the hierarchical MMR algorithm increases by 4.31%,5.33%,6.45%and 4.26%respectively compared with other methods.The Khmer multi-document extractive summarization method based on hierarchical MMR can effectively improve the quality of Khmer multi-document summary while ensuring the diversity and difference of the summary sentences.
作者
曾昭霖
严馨
余兵兵
周枫
徐广义
ZENG Zhaolin;YAN Xin;YU Bingbing;ZHOU Feng;XU Guangyi(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming,Yunnan 650500,China;Yunnan Nantian Electronic Information Industry Company Limited,Kunming,Yunnan 650040,China)
出处
《河北科技大学学报》
CAS
2020年第6期508-517,共10页
Journal of Hebei University of Science and Technology
基金
国家自然科学基金(61562049,61462055)。
关键词
多文档摘要
文本输入
语义信息
最大边缘相关
深度学习
多冗余
抽取式
多样性
natural language processing
Khmer
extractive summarization
deep learning
waterfall method
maximal marginal relevance(MMR)