摘要
版面分析目前已成为提高汉字识别系统效率的关键技术之一。针对中文版面较为复杂的特点,本文提出一种非文本区域优先的版面分析方法。该方法提取文档图像中所有连通域,根据连通域的大小进行聚类,从而可以得到文字连通域和非文字连通域,以达到分割版面的目的。实验结果表明,这种方法能够对比较规范的中文版面进行分析,具有较高的效率和较好的适应性。
The layout analysis has become a character recognition system to improve the efficiency of the key technologies. for the characteristics that Chinese layout more complex, the paper proposes a regional priority non-text layout analysis methods. The document image to extract all the connected domain, according to the size of the cluster connected domain, which can be connected domain and non-text characters connected domain, in order to achieve the purpose of partition layout. Experimental results show that this method can be a fairly standard layout of the Chinese, with higher efficiency and better adaptability.
出处
《科技资讯》
2010年第12期225-226,共2页
Science & Technology Information
关键词
版面分析
连通域
中文版面
优先提取
layout analysis
connected components
Chinese layout
extraction priority