摘要
随着文档影像系统的广泛应用,文档图像自动处理已成为当前的一个研究热点。对表格型文档自动识别系统中的若干关键技术进行了研究。首先,在版面分析中,提出了基于框线检测的文档分类方法;其次,根据表格型文档图像的特点,介绍了相应的识别域提取、框线去除以及手写字符串分割方法;最后,在手写数字识别部分,设计了一种基于形状上下文特征和梯度特征的组合识别方法。最后将该系统应用于银行票据小写金额识别,通过真实表格型票据进行仿真实验,证明了系统的有效性,系统识别率达到了实用的水平。
With the widely use of document image system, the automatic processing of document images has become a hot topic nowadays. Several pivotal techniques of the form document auto-processing system were emphatically discussed. Firstly, a document image classification method was adopted based on frame line detection in layout analysis. Secondly, corresponding algorithms were proposed on the basis of the characteristic of form document image, such as the pick-up of identification regions, frame line detection and removal and segmentation of handwritten character string. Finally, a combined recognition method based on shape context feature and gradient feature was designed during the part of handwritten digit recognition. The results of emulational experiment on real financial bill images illustrate the validity and practicability of the system.
出处
《系统仿真学报》
CAS
CSCD
北大核心
2009年第10期2916-2920,共5页
Journal of System Simulation
基金
国家自然科学基金(60632050
60503026)
863计划(2006AA01Z119)
关键词
表格型文档
框线检测
框线去除
文档图像分析
手写数字识别
tabular document
frame line detection
frame line removal
document image analysis
handwritten digit recognition