摘要
图像二值化是将文本从背景中分离的算法,是后继识别任务的关键预处理步骤。低质量的文本图像往往包含各种退化情况,比如:光照不均匀、笔画灰度变化过大、背面渗透等。这些退化情况导致其二值化非常困难。近几年出现的基于Laplacian能量的二值化方法对退化文档进行二值化取得较好的结果,但是上述方法容易导致细长弱笔画丢失。为此提出一种改进Laplacian能量的方法,利用笔画有较强的双边缘响应,对笔画区域的Laplacian算子响应进行加强,使得细长弱笔画得以保留。针对DIBCO2013数据测试表明本文的方法能够较好的处理细长弱笔画的二值化问题。
The main function of document image binarization algorithm is to extract text from background of im- age. Binarization is a key pre-proeessing of document automatic processing system. Extraction of text from badly de- graded document images is a very challenging task due to bad illumination, bleed though and the high inter/intra- variation between the document background and the foreground text of different document images. The recent algo- rithm which based on the Laplacian energy has achieved a good performance on the degraded document images, but the main drawback of this algorithm is that the thin long and weak strokes in the degraded document images can not he handled properly. In this paper, a modified Laplacian energy is proposed, which is based on the observation that the strokes have the relatively strong response of double edge. The thin long and weak strokes in the degraded document images can be segmented properly via the combination of the Laplacian with the double edge response of the image in- tensity. The experiments on the DIBC02013 dataset show the superior performance of our proposed method on the ex- traction of the thin long and weak strokes, compared with other techniques.
出处
《计算机仿真》
CSCD
北大核心
2015年第9期276-280,共5页
Computer Simulation