摘要
在中文拼写纠错任务中,字符在文本中的距离信息和顺序信息是重要的特征,因此位置编码至关重要。传统的位置编码无法区分字符的前后联系,此外二阶段方式的纠错方案存在错误传播问题。针对上述问题,提出一种多任务学习下融合位置编码的中文拼写纠错方法,使用融合位置编码更好地为模型提供位置信息;使用多任务学习机制缓解错误传播问题,提升模型泛化能力。针对公开数据集进行实验,实验结果在F1值方面有稳定提升,验证了所提方法的有效性。
In Chinese spelling correction tasks,the distance information and order information of characters in the text is important feature,so the position encoding is crucial.The traditional position encoding cannot distinguish the forward and backward connection of characters,and the error propagation problem exists in the error correction scheme of the two-stage approach.To address the above problems,a Chinese spelling error correction method with fused positional coding under multi-task learning was proposed,in which fusion position encoding was used to provide better positional information for the model,and a multi-task learning mechanism was used to alleviate the error propagation problem and improve the model generalization ability.Experiments were conducted for the public dataset.Experimental results show a stable improvement in the F1 value,which verifies the effectiveness of the proposed method.
作者
赵建辉
林川
任丽娜
黄瑞章
ZHAO Jian-hui;LIN Chuan;REN Li-na;HUANG Rui-zhang(Text Computing and Cognitive Intelligence Engineering Research Center of Ministry of Education,College of Computer Science and Technology,Guizhou University,Guiyang 550025,China;Department of Information Engineering,Guizhou Light Industry Technical College,Guiyang 550025,China;State Key Laboratory of Public Big Data,College of Computer Science and Technology,Guizhou University,Guiyang 550025,China)
出处
《计算机工程与设计》
北大核心
2024年第9期2844-2851,共8页
Computer Engineering and Design
基金
国家自然科学基金项目(62066007)
贵州省科技支撑计划基金项目(黔科合支撑【2022】一般277)。
关键词
中文拼写纠错
距离信息
顺序信息
位置编码
错误传播
融合位置编码
多任务学习
Chinese spelling error correction
distance information
order information
position encoding
error propagation
fusion position encoding
multi-task learning