摘要
深层神经网络在文档摘要方面取得了很好的效果,其优势只有在大数据集下才能显示出来。为了解决在使用深度学习做柬语单文档抽取式摘要时语料标注不足的问题,提出一种将主动学习和深度学习相结合的方法。利用主动学习抽样策略选择出定量的文档,通过专家标注,结合深度学习中编码器解码器模型进行训练模型抽取得到摘要。实验结果表明,在训练语料显著标注不足的情况下,该方法能够有效地提升柬语单文档摘要的质量。
The deep neural network has made a lot of progress in document summarization,and its advantages can only be displayed under the big dataset.In order to solve the problem that the Khmer uses the deep learning to make the single document extraction abstract corpus insufficient labeling,a method combining active learning and deep learning is proposed.The active learning sampling strategy was used to select the quantitative documents,marked by the experts,then combined with the encoder decoder model in deep learning,and the training model was extracted to obtain a summary.The experimental results show that even if the training corpus is not markedly marked,the result of extracting the abstract can effectively improve the quality of the Khmer single document abstract.
作者
余兵兵
严馨
周枫
徐广义
莫源源
Yu Bingbing;Yan Xin;Zhou Feng;Xu Guangyi;Mo Yuanyuan(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,Yunnan,China;Yunnan Provincial Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650040,Yunnan,China;Yunnan Nantian Electronic Information Industry Co.,Ltd.,Kunming 650040,Yunnan,China;School of Southeast&South Asia Languages and Culture,Yunnan Minzu University,Kunming 650500,Yunnan,China;Institute of Language Studies,Shanghai Normal University,Shanghai 200234,China)
出处
《计算机应用与软件》
北大核心
2021年第4期165-170,189,共7页
Computer Applications and Software
基金
国家自然科学基金项目(61562049,61462055)。
关键词
柬语
主动学习
单文档摘要
深度学习
Khmer
Active learning
Single-document summarization
Deep learning