摘要
手写文本识别方法主要应用于文本输入技术,对人机交互领域的发展起关键作用。针对多数在线输入法无法识别中英文混合手写识别的问题,提出一种在线中英文混合手写文本识别方法。通过对文本笔画进行基于水平相对位置、垂直重叠率、面积重叠率规则的整合以及连笔切分,得到一系列字符片段,同时利用笔画个数、宽高比、中心偏离、平滑度等几何特征和识别置信度,对字符片段进行中英文分类。在此基础上,根据分类结果并结合自然语言模型的路径评价及动态规划搜索算法,分别对候选的中、英文字符片段进行合并处理,得到待识别的中、英文字符序列,并将其分别送入卷积神经网络的中、英文识别模型中,得到手写文本识别结果。实验结果表明,在线手写中英文混合文本识别正确率达93.67%,不仅能切分在线手写中文文本行,而且对包含字符连笔的在线手写中英文文本行也有较好的切分效果。
Handwritten text recognition is mainly used in text input technology,which plays a key role in the development of human-computer interaction. To address the lack of functionality for Chinese and English mixed handwritten text recognition in most online input methods,an online Chinese and English mixed handwritten text recognition method is proposed.Through the integration of text strokes based on the horizontal relative position,vertical overlap rate,area overlap rate rules,and continuous stroke segmentation,a series of character segments are obtained.In addition,Chinese and English character segments are classified based on the number of strokes,aspect ratio,center deviation,smoothness,and recognition confidence. On this basis,according to the classification results,combined with the path evaluation of the natural-language model and dynamic programming search algorithm,the candidate and English character segments are combined to obtain the Chinese and English character sequences to be recognized,which are,respectively,sent to the Chinese and English recognition models of the Convolutional Neural Network(CNN)to obtain the handwritten text recognition results. The experimental results show that and the recognition accuracy of the online handwritten Chinese and English mixed text is 93.67%,the proposed method can segment online handwritten Chinese text lines as well as online handwritten Chinese and English text lines containing characters.
作者
付鹏斌
刘鹏辉
杨惠荣
董澳静
FU Pengbin;LIU Penghui;YANG Huirong;DONG Aojing(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2022年第3期253-262,共10页
Computer Engineering
基金
国家自然科学基金(61772048)
北京市自然科学基金(4153058)。
关键词
在线手写识别
中英文混合手写
中英文分类
文本行切分
路径评价
online handwriting recognition
mixed Chinese and English handwriting
Chinese and English classification
text line segmentation
path evaluation