摘要
版面分割是OCR(光学字符识别 )系统的重要组成部分 ,在将印刷文档转换为电子文档的过程中是必不可少的。该文对基于连通域的版面分割方法进行研究 ,首先用动态聚类法选取模糊阈值 ;然后对文档图像进行模糊处理 ,形成大的连通区域 ;再根据连通域的位置关系等分割出文档的各个分栏。实验表明 ,该方法对于简单的矩形版面及复杂的版面 ,例如 ,非Manhattan版面等都能较好地分割。
Page segmentation is an important part of optical character recognition (OCR) system,it is a necessary step in the process of transforming printed documents to electronic version.This paper proposed a page segmentation based on connective region.First,a dynamic clustering method was used to get the smearing value,then the image was smeared to get large connective regions,and the region position information was used to get the segmentation result.Experiments show the method can be efficiently used in both simple rectangle layout and other complex layout such as non Manhattan layout.
出处
《南京理工大学学报》
EI
CAS
CSCD
北大核心
2003年第1期16-19,共4页
Journal of Nanjing University of Science and Technology