摘要
本文提出一种新的基于Curvelet变换的文本图像二值化处理方法,以消除文本图像中局部高亮度区域对二值化图像质量的影响。首先对具有局部高亮度区域干扰的原始文本图像进行Curvelet变换,得到图像在曲波域的Curvelet系数集;然后根据各Curvelet系数所表征的图像特征,对Curvelet系数进行非线性增强,以优化文本图像的直方图分布;对增强的Curvelet系数集进行反变换,得到直方图优化后的时域图像,进而应用Otsu方法实现文本图像二值化。应用本文方法对具有带状及点状局部高亮度区域的文本图像进行二值化处理,并采用ABBYYFineReader10对二值图像进行OCR识别。实验结果表明,通过本文提出的处理方法所得到的二值化图像,其字符的OCR识别准确率最高可达94.81%,优于其他四种典型的图像二值化处理方法。
A novel binarization method for document images based on Curvelet transform is presented.The interference caused by local high lightness is eliminated to get a better image quality.Firstly,the Curvelet transformation is applied to the document images with local high lightness area,and the Curvelet coefficients can be got.Then,according to the feature of images represented by Curvelet coefficients,the Curvelet coefficients are enhanced nonlinearly to optimize the histogram distribution.Curvelet coefficients are transformed inversely to get the images,and then the Otsu method is applied to get the binary image.According to the binarized image,the OCR recognition results are got by the ABBYY FineReader10.Experimental results show that the highest recognition accuracy of characters could reach 94.81%.The performance of this method is better than the other four typical binarization methods.
出处
《光电工程》
CAS
CSCD
北大核心
2012年第11期75-80,共6页
Opto-Electronic Engineering
基金
国家自然基金资助项目(61102110)