摘要
本文提出了一种在彩色图像中进行文本区域的自动提取的方法。首先,应用色彩的统计模型,大大减小了图像的彩色空间的大小;其次,使用基于图理论进行彩色聚类,将图像分解成对应各类的多幅二值图;然后,在这些二值图的基础上进行连通分量分析,提取可能的文本区域,并对这些区域进行鉴别;最后,综合各二值图的提取结果,得到原始彩色图像中的文本区域。对于特定的应用,提取出的文本区域经过进一步的处理,可以输入字符识别(OCR)系统中进行识别。实验结果显示了本文提出的方法的有效性。
In this paper,an approach for text extraction from color images is proposed. First,with a statistical color model,the original color space for images is greatly reduced. Second,an unsupervised graph-theoretical clustering is carried out on the reduced color histogram,which decomposes original image into multiple binary images,each of which corresponding to one cluster resulting from clustering. Then,connected component analysis is applied on each of these binary images,and candidatetext regions are located. And text identification is carried out to discard those non-text regions. Finally,text regions on each binary image are integrated to form the detection result on the original color image. With further processing,these located text regions can be fed to existing optical character recognition (OCR) systems for specific applications. Experimental results show the efficiency of the proposed approach.
出处
《微电子学与计算机》
CSCD
北大核心
2003年第8期89-93,132,共6页
Microelectronics & Computer
关键词
彩色图像
文本提取
图理论
聚类
图像分解
图像处理
字符识别
计算机
Text extraction, Statistical color model, Graph- theoretical clustering, Connected component analysis, OCR