摘要
文中讨论了在文档处理中对表格型文档图像处理的一种预处理方法.由于表格型文档的框线会和文档中字符粘连或交叠,这将会降低OCR系统识别模块的性能,从而影响系统整体性能.文中首先针对这种情况提出了去除文档图像中线噪声的一个通用基本模型,给出了该基本模型消除线噪声的具体算法.然后在此模型的基础上对一些具体应用时遇到的情况提出了一些扩展方法.
A pre processing method for table form document image in document image processing is discussed in this paper. Because the box frame in the table form document touches or overlaps the characters in the box frame, which will affect the distinguishability of an OCR system, the performance of the total system drops. In order to solve this problem, a basic and general model of line style noise removal and its detailed algorithm are proposed. Based on this model, some extended methods are also proposed for special applications. Some experiments are also given to show the effectiveness of these methods.
出处
《计算机研究与发展》
EI
CSCD
北大核心
1999年第8期992-995,共4页
Journal of Computer Research and Development
基金
国家自然科学基金
国际合作研究项目