摘要
文档图像二值化是文档分析与识别中的一个重要环节。针对低质量手写体文档图像提出了一种二值化算法,算法首先对文档进行相位保持降噪并计算背景修复模板,然后用图像修复算法和形态学闭运算估计文档背景,用背景补偿算法提高文档对比度,接着用背景补偿后的文档图像构造拉普拉斯(Laplacian)能量,最后采用图割算法求得最终二值化结果。实验结果表明,所构造的拉普拉斯能量能够较准确地区分文字和背景,所提二值化算法在DIBCO2018数据集中的实验结果优于同类算法。
Document image binarization can protect the original document and better present the contents to the public.A new algorithm for low quality handwriting document binarization was proposed.The phase preserving denoising on document image was performedand calculated inpainting mask.Then the background was estimated with image inpainting procedure and morphological closing operation,contrast with the background compensation algorithm was improved.Finally the Laplacian energy for the background-compensated image was constructed and the graph cut algorithm was adopted to obtain the final binarization result.The experiments on the DIBCO2018 dataset show that the Laplacian energy constructed by the proposed algorithm can distinguish the text and the background more accurately and the binarization results of the proposed algorithm are better than the state-of-the-art techniques.
作者
冯炎
陈汝真
FENG Yan;CHEN Ru-zhen(School of Information Science and Technology,Tibet University,Lhasa 850000,China)
出处
《科学技术与工程》
北大核心
2020年第26期10835-10839,共5页
Science Technology and Engineering
基金
国家自然科学基金(61661047)。
关键词
文档图像
二值化
拉普拉斯能量
图割算法
document image
binarization
Laplacian energy
graph-cut algorithm