The past decade has seen the rapid development of text detection based on deep learning.However,current methods of Chinese character detection and recognition have proven to be poor.The accuracy of segmenting text box...The past decade has seen the rapid development of text detection based on deep learning.However,current methods of Chinese character detection and recognition have proven to be poor.The accuracy of segmenting text boxes in natural scenes is not impressive.The reasons for this strait can be summarized into two points:the complexity of natural scenes and numerous types of Chinese characters.In response to these problems,we proposed a lightweight neural network architecture named CTSF.It consists of two modules,one is a text detection network that combines CTPN and the image feature extraction modules of PVANet,named CDSE.The other is a literacy network based on spatial pyramid pool and fusion of Chinese character skeleton features named SPPCNN-SF,so as to realize the text detection and recognition,respectively.Our model performs much better than the original model on ICDAR2011 and ICDAR2013(achieved 85%and 88%F-measures)and enhanced the processing speed in training phase.In addition,our method achieves extremely performance on three Chinese datasets,with accuracy of 95.12%,95.56%and 96.01%.展开更多
基金This work is supported by the National Natural Science Foundation of China(61872231,61701297).
文摘The past decade has seen the rapid development of text detection based on deep learning.However,current methods of Chinese character detection and recognition have proven to be poor.The accuracy of segmenting text boxes in natural scenes is not impressive.The reasons for this strait can be summarized into two points:the complexity of natural scenes and numerous types of Chinese characters.In response to these problems,we proposed a lightweight neural network architecture named CTSF.It consists of two modules,one is a text detection network that combines CTPN and the image feature extraction modules of PVANet,named CDSE.The other is a literacy network based on spatial pyramid pool and fusion of Chinese character skeleton features named SPPCNN-SF,so as to realize the text detection and recognition,respectively.Our model performs much better than the original model on ICDAR2011 and ICDAR2013(achieved 85%and 88%F-measures)and enhanced the processing speed in training phase.In addition,our method achieves extremely performance on three Chinese datasets,with accuracy of 95.12%,95.56%and 96.01%.