摘要
针对中文同一个词的不同词性在句子中所代表的关系不同的问题,提出基于Transformer融合词性特征的中文语法纠错(CGEC)模型,所提模型将语言学知识作为辅助信息融入中文语法纠错任务。首先,在不改变句子序列长度的基础上,在原始词嵌入层中以不同方式拼接词性向量,得到全差异词嵌入、词差异词嵌入和词性差异词嵌入三种不同的词嵌入方式;然后,将新的词嵌入方式与Transformer模型相结合,对错误语句进行语法纠错。实验结果表明,三种词嵌入方式均不同程度地提高了F0.5值,且全差异词嵌入方式的效果最好:与Transformer模型相比,F0.5提升了2.73个百分点,BLEU提升了6.27个百分点;与基于Transformer增强架构的中文语法纠错模型相比,F0.5提升了1.88个百分点。所提模型在对词性特征提取时可以侧重源语句与目标语句的语法差异,更好地捕捉句子的语法特征。
Aiming at the problem that different part-of-speech of the same Chinese word represents different relationships in sentences,a Chinese Grammatical Error Correction(CGEC)model based on Transformer fused with part-ofspeech feature was proposed.Linguistic knowledge was incorporated as auxiliary information into Chinese grammatical error correction tasks by the proposed model.First,without changing the length of the sentence sequence,the part-of-speech vectors were spliced in different ways in the original word embedding layer to obtain full-difference word embedding,worddifference word embedding and part-of-speech-difference word embedding.Then,the new word embedding methods were combined with the Transformer model to perform grammatical error correction on wrong sentences.The experimental results show that the three word embedding methods improves the F0.5value to varying degrees,and the full-difference word embedding has the best effect.Compared with the Transformer model,the F0.5value of the full-difference word embedding increases by 2.73 percentage points and BLEU(Bilingual Evaluation Understudy)increases by 6.27 percentage points.Compared with the Chinese grammatical error correction model based on the Transformer enhanced architecture,F0.5increases by 1.88 percentage points.The proposed algorithm enables the model to focus on the grammatical differences between the source sentence and the target sentence when extracting part-of-speech features,so as to better capture the grammatical features of sentences.
作者
尚海怡
黄继风
陈海光
SHANG Haiyi;HUANG Jifeng;CHEN Haiguang(College of Information,Mechanical and Electrical Engineering,Shanghai Normal University,Shanghai 201418,China)
出处
《计算机应用》
CSCD
北大核心
2022年第S02期25-30,共6页
journal of Computer Applications
基金
上海市地方能力建设项目(19070502900)。