摘要
BERT预训练语言模型在一系列自然语言处理问题上取得了突破性进展,对此提出探究BERT预训练模型在中文文本摘要上的应用。探讨文本摘要信息论框架和ROUGE评分的关系,从信息论角度分析中文词级粒度表示和字级粒度表示的信息特征,根据文本摘要信息压缩的特性,提出采用全词遮罩(Whole Word Masking)的中文预训练语言模型BERT_wwm作为编码器提取词级粒度信息特征,采用多层Transformer作为解码器以字为粒度生成摘要的混合字词特征中文文本摘要模型。分别以BERT_base_Chinese、BERT_wwm_Chinese、BERT_wwm_ext_Chinese和RoBERTa_wwm_ext_Chinese四种预训练语言模型作为中文词级信息特征编码器,在LCSTS数据集上进行实验,采用ROUGE作为评价指标。结果表明,RoBERTa_wwm_ext_Chinese+Transformer的编码器-解码器框架的ROUGE-1、ROUGE-2和ROUGE-L的F1评分分别达到了44.60、32.33和41.37,性能超过了HWC+Transformer方法。
BERT pretrained language model has achieved breakthrough in many natural language processing tasks.This paper attempts to explore the application of BERT in Chinese text summarization.We discussed the relation between the text summarization information theory framework and ROUGE score,and we analyzed the information characteristics between Chinese word level granularity representation and character level granularity representation in the perspective of information theory.Based on the characteristics of information compression,we proposed a hybrid word-character feature Chinese text summarization model which used the whole word masking BERT_wwm as encoder,and multiple transformers as decoder.We used BERT_base_Chinese,BERT_wwm_Chinese,BERT_wwm_ext_Chinese and RoBERTa_wwm_ext_Chinese as Chinese word level information feature extracted encoder,and carried out experiments on LCSTS dataset using ROUGE as the evaluation method.The results show that,the F1 scores of ROUGE-1,ROUGE-2 and ROUGE-L in RoBERTa_wwm_ext_Chinese+Transformer architecture are 44.60,32.33 and 41.37,which exceed the HWC+Transformer method.
作者
劳南新
王帮海
Lao Nanxin;Wang Banghai(School of Computer,Guangdong University of Technology,Guangzhou 510006,Guangdong,China)
出处
《计算机应用与软件》
北大核心
2022年第6期258-264,296,共8页
Computer Applications and Software
基金
国家自然科学基金项目(61672007)。
关键词
中文文本摘要
信息论
BERT语言模型
混合字词特征
Chinese text summarization
Information theory
BERT language model
Hybrid word-character feature