摘要
本文提出一种在数字文档图像中自动检测和提取文字的算法.首先对图像在不同方向和阶数上进行Gabor滤波,得到反映文档图像布局的滤波图像,然后在得到的滤波图像中直接提取候选文字区域,再利用几何特性和高频分量特性筛选准则从中剔除非文字区域.最后选取了不同类型、不同语言和不同字体的文档图像进行实验,实验结果表明本算法对各种文档图像均能给出满意的结果.
This paper presents an algorithm that can automatically detect and extract text in digital document images. Firstly, we process and fuse Gabor filtered images at different orientations and scales and obtain an image that reflects the layout of the document image. Then, potential text regions are directly extracted from the resulting image. Finally, two criteria based on the geometrical property and high frequency content are adopted to kick-out those non-text regions. The experiments are performed on some representative images with different styles and with texts in different languages and fonts. Experimental results show that the algorithm works well on document images from a wide variety of source.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2006年第B12期2387-2390,共4页
Acta Electronica Sinica