期刊文献+

古代汉字文献切分研究 被引量:8

Research on segmentation of historical Chinese books
下载PDF
导出
摘要 针对古代汉字文档的特点,提出了适合于古文档的列切分方法和字切分方法。提出的列切分方法直接对文档的笔画投影进行分析,采用一种基于分层投影过滤和变长间隙阈值的递归切分算法。该算法在列间隔较小、列与格线存在粘连、文档具有一定程度的倾斜的情况下,也能准确地抽取出列,尤其对短列的切分达到了较好的效果。提出的字切分方法分为两步,进行粗切分确定大致的切分位置,采用基于连通域分析与粘连点判断的方法做进一步的细切分。该算法对具有较多粘连和重叠汉字的列,也能较好地切分出完整的单字。实验结果表明,提出的方法用于古代汉字文档切分能够获得较好的效果。 In this paper, the methods of text line segmentation and character segmentation are proposed according to the charac- teristics of historical Chinese documents. The method of line segmentation analyzes stroke projection, and adopts a recursive segmentation algorithm based on various project thresholds and gap thresholds. This algorithm is robust in the cases of text line adhesion and skew, especially short text lines. The method of character segmentation has two steps. A rough segmentation is applied to get the approximate positions of segmentation. A fine segmentation based on the analysis of connected components and the judgment of adhesion points is carried out. This algorithm can extract the characters even though they overlap and connect each other. The experimental results show the methods have good performance and are suitable for the segmentation of historical Chinese documents.
出处 《计算机工程与应用》 CSCD 2013年第2期29-33,38,共6页 Computer Engineering and Applications
基金 国家自然科学基金(No.6097507)
关键词 文档图像处理 文档切分 古籍数字化 document image processing Chinese character segmentation ancient books digitalization
  • 相关文献

参考文献8

二级参考文献61

共引文献39

同被引文献78

引证文献8

二级引证文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部