摘要
针对文档图像光照不均匀以及手写字符与印刷字符接近甚至粘连等问题,提出一套提取字符并区分手写体和印刷体的方案。首先提出一种基于开关映射(toggle mapping,TM)的双阈值二值化方法,用来提取非均匀光照文档图像中的字符;然后将整幅图像分割成大小相同的网格,从每个网格的邻域中提取边缘特征矩阵。由于相邻网格特征的相似性,使用了基于判别随机场(Discriminative Random Fields,DRF)的分类框架将网格分成手写体和印刷体两类。利用文本行信息的后处理获得更精细、意义更明确的分类结果。在信封邮编区域图像数据库的实验结果表明,提出的方案能够有效提取和辨别非均匀光照文档图像中粘连在一起的手写体和印刷体。另外,在IMA数据库上的实验表明,文中提出的边缘特征矩阵在辨别手写体和印刷体上的性能达到甚至超过以往文献中提出特征的性能。
The problem of connected handwritten and printed text discrimination in uneven lighted document images. A method based on Toggle Mapping (TM) with two thresholds is proposed for the binarization of uneven lighted document images. Then, the whole image is divided into grids with fixed size. The feature matrixes are abstracted from the neighborhoods of these grids. Considering the similarities of the neighbors, a method based on Discriminative Random Fields (DRF) is used to classify the fea- tures. Finally, connected component (CC) discrimination is obtained by using the post processing with line information. The ex- periment on our envelop image database shows the effectiveness of the proposed method and the experiment on IMA database shows the performance of the proposed local feature can meet or exceed the performances of other features in previous literature.
出处
《计算机工程与设计》
CSCD
北大核心
2012年第12期4634-4638,共5页
Computer Engineering and Design
基金
福建省南平市科技基金项目(Z2010Z10(5))
南通大学校级自然科学基金项目(11Z070)
关键词
手写体印刷体辨别
图像二值化
开关映射
边缘特征矩阵
判别随机场
handwritten and printed text discrimination
image binarization
toggle mapping
edge feature matrix
discrimina- tive random field