期刊文献+

面向OCR文本识别词错误自动校对方法研究 被引量:12

The Research on the Automatic Proofreading Method of Word Errors in OCR recognizied Text
下载PDF
导出
摘要 针对OCR识别后文本中词错误校对问题,提出一种同一特征多角度结合的OCR识别后文本中词错误的自动校对方法。方法通过上下文相邻词与窗口移动法相结合为句子中字词串计算置信度,设计置信度计算方法判断正误,给出疑似错误位置;利用统计语言模型与同一特征多角度相结合的方式对错误处提出改进建议。采用检察院纸质卷宗OCR识别后的文本数据进行了测试,实验测试集中共包含236处错误。实验结果表明,所提出的方法能够有效发现文本中错误,查错召回率达到88.56%,纠错的准确率达到79%,上述方法能够有效实现OCR识别后的中文文本自动校对。 Due to the defects of word error checking after OCR recognition,this paper proposes an automatic cor⁃rection method of word errors after OCR recognition based on the same feature and multi-angles.Firstly,the confi⁃dence degree of the word string in the sentence was calculated through combining the context adjacent words and win⁃dow moving method.Then,the confidence calculation method was designed to judge the right or wrong,thus giving the position of suspected error.Finally,the statistical language model was combined with the same feature from multi-angles to improve the error.This paper utilized the OCR identified text data of the paper files belonging to procura⁃torate to test.The results show that the method in our work is effective to detect errors in the text.The recall rate of er⁃ror detection reaches up to 88.56%,and the accuracy rate of error correction is up to 79%,which demonstrates that the method in our work is effective to realize the automatic check of Chinese text after OCR recognition.
作者 郝亚男 乔钢柱 谭瑛 HAO Ya-nan;QIAO Gang-zhu;TAN Ying(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China)
出处 《计算机仿真》 北大核心 2020年第9期333-337,共5页 Computer Simulation
基金 山西省重点研发计划重点项目(201703D111011)。
关键词 窗口移动法 拼写查错 拼写纠错 语言模型 Window movement method Spelling check Spelling correction Language model
  • 相关文献

参考文献7

二级参考文献55

共引文献79

同被引文献171

引证文献12

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部