摘要
针对中文拼写纠错,提出两种新的改进方法。其一,在Transformer注意力机制的基础上,添加高斯分布的偏置矩阵,用于提高模型对局部文本的关注程度,加强对错误文本中错误字词和周边文字的信息提取。其二,使用ON_LSTM模型,对错误文本表现出的特殊语法结构特征进行语法信息提取。实验结果表明,所提出的两种方法均能有效提高准确率和召回率,并且,将两种方法融合后的模型取得最高F1值。
Two new methods for improving Chinese spelling correction are proposed.The first one is to add Gaussian Bias matrices to the Transformer’s attention mechanism,which is used to improve the model’s attention to local text and to extract information from the wrong words and the surrounding text in the error text.Secondly,the ON_LSTM model is used to extract grammatical information on the special grammatical structure features exhibited by the error text.The experimental results show that both methods are effective in improving accuracy and recall,and the model after fusing the two methods achieves the highest F1 value.
作者
段建勇
袁阳
王昊
DUAN Jianyong;YUAN Yang;WANG Hao(School of Information Science and Technology,North China University of Technology,Beijing 100043)
出处
《北京大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2021年第1期61-67,共7页
Acta Scientiarum Naturalium Universitatis Pekinensis
基金
国家自然科学基金(61972003,61672040)资助。