摘要
摘要提取的一大难题是如何在不丢失关键信息的情况下简约地描述整个文档。监督模型因通常需要大量的训练语料而在实际使用中受限。子集选择算法是无监督自动文档摘要的有效方法。在该类模型中,摘要提取被建模为求解某个目标表达式的最优值。然而,优化子集选择表达式是一个NP问题,当前普遍采用贪婪式算法来求解。基于此,提出了一种新的基于遗传算法的非监督摘要提取框架,并充分考虑了中文中段首句和段尾句的重要性。实验结果表明,该方法具有较好的提取性能。
The difficulty of abstract extraction is how to describe the whole document concisely without losing key information.Supervised model usually needs a large number of training corpuses,which leads to its limitation in practice.Subset selection algorithm is an effective method for unsupervised automatic document summarization.In this kind of model,abstract extraction is modeled as solving the optimal value of a target expression.However,the optimized subset selection expression is an NP problem,and current algorithms generally use greedy algorithms to solve them.Therefore,this paper proposes a new unsupervised extraction method based on genetic algorithm,and the importance of the first sentence and the last sentence of the paragraph in Chinese is fully considered.Experimental results indicate that the proposed method has good extraction performance.
作者
王涛
范晓波
胥小波
WANG Tao;FAN Xiaobo;XU Xiaobo(Institute of Science and Technology Information of Sichuan,Chengdu Sichuan 610000,China;China Electronic Technology Cyber Security Co.,Ltd.,Chengdu Sichuan 610000,China)
出处
《通信技术》
2021年第5期1120-1125,共6页
Communications Technology
关键词
摘要提取
遗传算法
子集选择
NP问题
summarization
genetic algorithm
subset selection
NP problem