期刊文献+

东巴象形文字文档图像的文本行自动分割算法研究

Automatic segmentation algorithm for text lines of Dongba hieroglyphs document image
下载PDF
导出
摘要 以卷积神经网络(CNN)为代表的深度学习技术在图像分类和识别领域表现出了非常优异的性能。但东巴象形文字未有标准、公开的数据集,无法借鉴或使用已有的深度学习算法。为了快速建立权威、有效的东巴文字库,分析已出版东巴文档的版面结构,从文档中提取文本行、东巴字成为了当前的首要任务。因此,结合东巴象形文字文档图像的结构特点,给出了东巴文档图像的文本行自动分割算法。首先利用基于密度和距离的k-均值聚类算法确定了文本行的分类数量和分类标准;然后,通过文字块的二次处理矫正了分割中的错误结果,提高了算法的准确率。在充分利用东巴字文档结构特征的同时,保留了机器学习模型客观、无主观经验影响的优势。通过实验表明,该算法可用于东巴文档图像、脱机手写汉字、东巴经的文本行分割,以及文本行中东巴字和汉字的分割,具有实现简单、准确性高、适应性强的特点,从而为东巴文字库的建立奠定基础。 Deep learning technologies represented by convolutional neural networks(CNN)have shown excellent performance in the field of image classification and recognition.However,since there is no standard and public dataset for Dongba hieroglyphs,we cannot draw on or use the existing deep learning algorithms.In order to establish an authoritative and effective Dongba hieroglyphs dataset,the current primary task is to analyze the layout structure of the published Dongba classic documents,and extract the text lines and Dongba hieroglyphs in the documents.Therefore,based on the structural features of Dongba hieroglyphic document images,an automatic text-line segmentation algorithm was proposed for Dongba document images.The algorithm first employed the d-k-means clustering algorithm to determine the classification quantity and classification standard of text lines;then,the wrong results in the segmentation were corrected through the secondary processing of the text blocks,so as to enhance the accuracy of the algorithm.While making full use of the structural features of Dongba characters,the algorithm retained such advantages of the machine-learning model as objectivity and immunity to subjective experience.Experiments show that the algorithm can be used for the text line segmentation of Dongba document images,offline handwritten Chinese characters,Dongba scriptures,and the segmentation of individual Dongba and Chinese characters in text lines.It is simple in implementation,high in accuracy,and strong in adaptability,thus laying the foundation for the establishment of the Dongba character library.
作者 康厚良 杨玉婷 KANG Hou-liang;YANG Yu-ting(Sports Department,Suzhou Vocational University,Suzhou Jiangsu 215000,China;School of Computer Engineering,Suzhou Vocational University,Suzhou Jiangsu 215000,China)
出处 《图学学报》 CSCD 北大核心 2022年第5期865-874,共10页 Journal of Graphics
基金 苏州市职业大学引进人才科研启动金项目(201905000034)。
关键词 东巴象形文字 东巴文档分析 文本行分割 投影分割 d-K-means Dongba hieroglyph Dongba documents analysis text line segmentation projection segmentation d-K-means
  • 相关文献

参考文献11

二级参考文献70

共引文献321

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部