期刊文献+

基于深度学习的语义级中文文本自动校对研究

Research on Semantic Level Chinese Text's Automatic Proofreading Technology Based on Deep Learning
下载PDF
导出
摘要 中文文本校对技术在字词级和语法级层面已取得了较好的效果,但在语义层面还没有比较成熟的方法。为实现语义级中文文本自动校对,将深度学习技术引入自动校对。首先,出于中文文本语义级自动校对的需要,在现有已公开的中文校对测试集的基础上,加入语义错误样本数据,并通过数据增强技术,扩大语义差错数据规模,以使训练集及测试集中语义错误占比达到50%以上。其次,针对典型的语义错误类型,构建其对应的语义知识集,包括成语知识集、古诗词知识集、历史人物主要事件朝代年表知识集、敬谦词知识集、地理知识集等。在建立语义知识集的基础上,基于BERT预训练模型对数据集进行训练。最后,经过预训练,在初步确定模型之后,结合关键参数,进行微调,确定最终的自动校对模型。 Chinese text proofreading technology has achieved good results at the word level and grammar level,but there is no mature method at the semantic level.In order to realize automatic proofreading of semantic Chinese text,deep learning technology is introduced into automatic proofreading.First of all,for the need of automatic proofreading of Chinese text at the semantic level,semantic error sample data is added on the basis of the existing open Chinese proofreading test set,and the scale of semantic error data is expanded through data enhancement technology,so that the proportion of semantic errors in the training set and test set reaches more than 50%.Secondly,for the typical types of semantic errors,their corresponding semantic knowledge sets are built,including idiom knowledge sets,ancient poetry knowledge sets,historical figures'main events,dynastic chronology knowledge sets,respect and modesty words knowledge sets,geographical knowledge sets,etc.On the basis of establishing semantic knowledge set,the data set is trained based on BERT(Bidirectional Encoder Representation from Transformers)pre training model.Finally,after pre training and preliminary determination of the model,the final automatic proofreading model is determined by fine-tuning the key parameters.
作者 张芙蓉 罗志娟 ZHANG Fu-rong;LUO Zhi-juan(Changsha Aeronautical Vocational and Technical College,Changsha Hunan 410124)
出处 《长沙航空职业技术学院学报》 2022年第3期33-37,共5页 Journal of Changsha Aeronautical Vocational and Technical College
基金 湖南省自然科学基金资助项目“基于深度学习的语义级中文自动校对方法”(编号:2020JJ7085)阶段性研究成果。
关键词 深度学习 自动校对 语义 知识库 中文文本 deep learning automatic proofreading semantics knowledge base Chinese text
  • 相关文献

参考文献11

二级参考文献61

共引文献335

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部