期刊文献+

朝汉混排古籍的文字切分方法 被引量:4

Characters Segmentation Method of Historical Documents Mixed in Korean and Chinese
下载PDF
导出
摘要 为解决朝鲜语古籍数字化中朝汉文种混排字符切分困难的问题,提出一种朝鲜语古籍图像的文字切分算法。针对古籍列与列之间存在不连续间隔线、倾斜或者粘连等问题,提出一种基于连通域投影的列切分方法。利用连通域的删除、合并、拆分等操作对文字进行切分。使用一种多步切分法完成了具有文字大小不一,横向、纵向混合排版特点图像的字符切分工作。对于粘连字,采用改进的滴水算法进行有效切分。实验结果表明所提出的算法能够很好地完成朝、汉文种混排,文字大小不一,排版情况复杂的朝鲜语古籍图像的文字切分工作。该算法的列切分准确率为97.69%,字切分准确率为87.79%。 To solve the character segmentation problem for Korean historical document digitization,the paper proposes an effective character segmentation algorithm.In the algorithm,it first divides the document according to columns based on connected component rule and projection method which can handle the scenario of discontinuity separator lines,skew or joined characters contained in Korean historical documents.And then,the characters are segmented by employing the operation of deletion,merging and splitting on the connected components.It uses a multi-step technique which makes full use of the characteristics of different character sizes,horizontal and vertical mixed arrangement in the text image to complete this segmentation.For connected characters,an improved drop fall algorithm is adopted to get effective segmentation.The experimental results show that the proposed algorithm can effectively accomplish the segmentation of Korean history documents which have multi-language,different character size and complex arrangement.In the dataset,the accuracy of column segmentation and character segmentation can achieve 97.69%and 87.79%separately.
作者 刘星辰 金小峰 LIU Xingchen;JIN Xiaofeng(Intelligent Information Processing Laboratory,Department of Computer Science&Technology,Yanbian University,Yanji,Jilin 133002,China)
出处 《计算机工程与应用》 CSCD 北大核心 2020年第11期135-141,共7页 Computer Engineering and Applications
基金 吉林省教育厅“十三五”科学技术项目(No.JJKH20191126KJ) 延边大学世界一流学科建设培育项目(No.18YLPY14)。
关键词 古籍数字化 朝鲜语古籍 列切分 字符切分 ancient books digitalization Korean historical documents column segmentation character segmentation
  • 相关文献

参考文献6

二级参考文献50

共引文献30

同被引文献44

引证文献4

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部