期刊文献+

基于语法知识增强的中文语法纠错 被引量:1

Chinese Grammatical Error Correction Based on Grammatical Knowledge Enhancement
下载PDF
导出
摘要 语法纠错旨在判断自然语言文本中是否包含语法错误并对句子进行纠正。随着预训练语言模型的迅速发展,基于预训练语言模型的方法被广泛应用于中文语法纠错(CGEC)领域,然而现有的预训练语言模型缺乏语法纠错领域的特定语法知识,导致语法纠错效果不佳。针对该问题,提出一种基于语法知识图谱预训练模型的CGEC模型。首先进行结构化知识编码,将语法知识图谱中的结构化知识映射到词语实体嵌入中,然后通过特定的预训练掩码策略联合学习上下文和词语之间的语法知识以预测字符和词语,最后通过设置检错网络和纠错网络对预训练模型进行微调,以完成CGEC任务。通过上述过程充分提取语法知识,以帮助模型更好地捕捉句子中词语之间的语法关系。在NLPCC 2018测试数据集上的实验结果表明,语法知识增强的方法使得模型的F0.5值提升4.83个百分点,所提模型的F0.5值相比NLPCC 2018共享任务中排名第一的模型高8.85个百分点,验证了基于语法知识图谱的预训练模型在CGEC任务中的有效性。 The aim of grammatical error correction is to judge whether natural language texts contain grammatical errors,to correct them.In recent years,with the rapid development of pre-trained language models,methods based on such models have been widely used in the field of Chinese Grammatical Error Correction(CGEC).However,the existing pre-trained language models lack specific grammatical knowledge in the grammatical error correction field,resulting in poor grammar correction effect.To solve this problem,this paper proposes a CGEC model based on a pre-training model with grammatical knowledge graph.First,the model uses structured knowledge encoding to map the structured knowledge into word entity embedding.Subsequently,the context and grammatical knowledge between words are jointly learned through a specific pre-training mask strategy,to predict characters and words.Finally through error detection and correction networks,the pre-training model is fine-tuned for CGEC.Based on the serial application of these three components,grammatical knowledge can be extracted to a greater extent,thereby helping the model better capture the grammatical relationship between words in sentences.The experimental results on the NLPCC 2018 test dataset show that the method for enhancing grammatical knowledge improves F0.5 score of the model by 4.83 percentage points,and F0.5 score of the proposed model is 8.85 percentage points higher than that of the first model on the NLPCC 2018 shared task,which proves the effectiveness of using the pre-training model based on grammatical knowledge graph for CGEC.
作者 邓倩 陈曙 叶俊民 DENG Qian;CHEN Shu;YE Junmin(School of Computer Science,Central China Normal University,Wuhan 430079,China)
出处 《计算机工程》 CAS CSCD 北大核心 2023年第11期77-84,共8页 Computer Engineering
基金 国家社会科学基金后期资助项目(20FTQB020)。
关键词 语法纠错 预训练语言模型 异构知识编码 知识图谱 深度学习 grammatical error correction pre-trained language model heterogeneous knowledge encoding knowledge graph deep learning
  • 相关文献

参考文献2

二级参考文献8

共引文献6

同被引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部