多特征融合的文档图像版面分析被引量：7

Layout analysis of document images based on multifeature fusion

导出

摘要目的在文档图像版面分析上,主流的深度学习方法克服了传统方法的缺点,能够同时实现文档版面的区域定位与分类,但大多需要复杂的预处理过程,模型结构复杂。此外,文档图像数据不足的问题导致文档图像版面分析无法在通用的深度学习模型上取得较好的性能。针对上述问题,提出一种多特征融合卷积神经网络的深度学习方法。方法首先,采用不同大小的卷积核并行对输入图像进行特征提取,接着将卷积后的特征图进行融合,组成特征融合模块;然后选取Deeplab V3中的串并行空间金字塔策略,并添加图像级特征对提取的特征图进一步优化;最后通过双线性插值法对图像进行恢复,完成文档版面目标,即插图、表格、公式的定位与识别任务。结果本文采用m IOU(mean intersection over union)以及PA(pixel accuracy)两个指标作为评价标准,在ICDAR 2017 POD文档版面目标检测数据集上的实验表明,提出算法在m IOU与PA上分别达到87.26%和98.10%。对比FCN(fully convolutional networks),提出算法在m IOU与PA上分别提升约14.66%和2.22%,并且提出的特征融合模块对模型在m IOU与PA上分别有1.45%与0.22%的提升。结论本文算法在一个网络框架下同时实现了文档版面多种目标的定位与识别,在训练上并不需要对图像做复杂的预处理,模型结构简单。实验数据表明本文算法在训练数据较少的情况下能够取得较好的识别效果,优于FCN和Deeplab V3方法。 Objective Document image layout analysis aims to segment different regions on the basis of the content of the page and to identify the different regions quickly.Different strategies must be developed for diverse layout objects owing to varied handling for each type of area.Therefore,document image layout must be first analyzed to facilitate subsequent processing.The traditional method of document image layout analysis is generally based on complex rules.The method of first positioning and post-classification cannot simultaneously achieve the regional positioning and classification of document layout,and different document images need their own specific strategies,thereby limiting versatility.Compared with the feature representation of traditional method,the deep learning model has powerful representation and modeling capabilities and is further adaptable to complex target detection tasks.Proposal-based networks,such as Faster region-convolutional neural networks(Faster R-CNN)and region based fully convolutional network(R-FCN),and proposal-free networks,such as single shot multbox detecter(SSD),you only look once(YOLO),and other representative object-level object detection networks,have been proposed.The application of pixel-level object detection networks,such as fully convolutional networks and a series of Deep Lab networks,enables deep learning technology to make breakthroughs in target detection tasks.In deep learning,object detection techniques at the object or pixel level have been applied in document layout analysis.However,most methods based on deep learning currently require complex preprocessing processes,such as color coding,image binarization,and simple rules,making the model structure complex.Moreover,the document image will lose considerable information due to the complicated preprocessing process,which affects the recognition accuracy.In addition,common deep learning models are difficult to apply to small datasets.To address these problems,this paper proposes a deep learning method for multi-feature fusion convolutional neural networks.Method First,feature extraction is performed on the input image by convolution layers composed of convolution kernels with different sizes.The convolutional layer of the parallel extraction feature has three layers.The numbers of three convolution kernels are 3,4,and 3.The first layer uses a large-scale convolution kernel with sizes of 11×11,9×9,and 7×7 to increase the receptive field and retain additional feature information.The number of convolution kernels in the second layer is 4,and the sizes of the convolution kernel are 7×7,5×5,3×3,and 1×1 to increase the feature extraction while ensuring coarse extraction.The third layer is composed of three different scale convolution kernels of 5×5,3×3,and 1×1 to extract detailed information further.The feature fusion module consists of a convolutional layer and a 1×1 size convolution kernel.The fusion module then adds the convolutional layer to extract the features again.The atrous spatial pyramid pooling(ASPP)strategy in Deep Lab V3 is selected.ASPP consists of four convolution kernels with different sizes,which are the standard 1×1 convolution kernel and 3×3 atrous convolution kernel with expansion ratios of 6,12,and 18.When the size of the sampled convolution kernel is close to the size of the feature map,the 3×3 atrous convolution kernel loses the capability to capture full image information and degenerates into a 1×1 convolution kernel;thus,image-level features are added.The role of ASPP is to expand the receptive field of the convolution kernel without losing the resolution and to retain the information of the feature map to the utmost extent.Finally,the image is restored by bilinear interpolation,and the document layout target is completed as the positioning and identification of figures,tables,and formulas.During training,the experimental environment is Ubuntu 18.04 system,which is trained with Tensor Flow framework and NVDIA 1080 GPU with 16 GB memory.The data use the ICDAR2017 POD document layout target detection dataset with 1600 training images and 812 test images.The input data pixels are uniformly reduced to 513×513 during training to reduce the model training parameters.Result Mean intersection over union(IOU)and pixel accuracy(PA)are used as evaluation criteria.The experiments on the ICDAR 2017 POD document layout object detection dataset show that the proposed algorithm achieves 87.26%and 98.10%m IOU and PA,respectively.Compared with fully convolutional networks,the proposed algorithm improves m IOU and PA by 14.66%and2.22%,respectively,and the proposed feature fusion module improves m IOU and PA by 1.45%and 0.22%,respectively.Conclusion This paper proposes the positioning and recognition of multiple targets in the document layout under a network framework.It does not need complex preprocessing on the image,and it simplifies the model structure.The experimental data prove that the algorithm can further efficiently identify the background,illustrations,tables,and formulas and achieve improved recognition results with less training data.

作者应自炉赵毅鸿宣晨邓文博 Ying Zilu;Zhao Yihong;Xuan Chen;Deng Wenbo(School of Intelligent Manufacturing,Wuyi University,Jiangmen 529020,China)

机构地区五邑大学智能制造学部

出处《中国图象图形学报》 CSCD 北大核心 2020年第2期311-320,共10页 Journal of Image and Graphics

基金国家自然科学基金项目(61771347) 广东省特色创新类项目(2017KTSCX181) 广东省青年创新人才类项目(2017KQNCX206) 江门市科技计划项目(江科〔2017〕268号) 五邑大学青年基金项目(2015zk11).

关键词文档图像处理版面分析目标检测深度学习语义分割 document image processing layout analysis object detection deep learning semantic segmentation

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

同被引文献45

1李艳霞,孙羽菲,张玉志.受限表格识别系统的研究[J].计算机工程与应用,2006,42(31):161-163. 被引量：6
2赵洪,肖洪,薛德军,师庆辉.Web表格信息抽取研究综述[J].现代图书情报技术,2008(3):24-31. 被引量：11
3王行荣,应俊.手写表格识别系统研究和实现[J].计算机科学,2008,35(6):268-271. 被引量：5
4张利,朱颖,吴国威.版面分割中文本区域最佳结构表示树的生成算法[J].中国图象图形学报（A辑）,1998,3(7):553-556. 被引量：2
5胡大洋.基于启发式规则的多记录页面分隔符识别方法[J].软件导刊,2009,8(9):50-51. 被引量：1
6秦振海,谭守标,徐超.基于Web的表格信息抽取研究[J].计算机技术与发展,2010,20(2):217-220. 被引量：6
7袁鸿雁.基于本体的Web表格信息抽取技术的研究[J].青岛大学学报（自然科学版）,2010,23(2):47-51. 被引量：3
8李国杰,程学旗.大数据研究:未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考[J].中国科学院院刊,2012,27(6):647-657. 被引量：1604
9雷悦,郭家龙,罗卫民.CRISPR/Cas9介导基因定点编辑技术在精准医疗的应用[J].湖北医药学院学报,2018,37(6):583-588. 被引量：3
10郑冶枫,刘长松,丁晓青,潘世言.基于有向单连通链的表格框线检测算法[J].软件学报,2002,13(4):790-796. 被引量：23

引证文献7

1徐一鸣,潘伟民.基于深度学习的多重文档结构识别方法研究[J].电子设计工程,2021,29(21):53-56. 被引量：1
2程鑫,褚雪汝,邓旭晖,杨凯,谭林林,陈中,曹卫国.基于双层-分块检测网络的厂站接线图纸图符检测方法[J].东南大学学报（自然科学版）,2022,52(6):1137-1144.
3李玉腾,史操,许灿辉,程远志.基于视觉和文本的多模态文档图像目标检测[J].计算机应用研究,2023,40(5):1559-1564. 被引量：2
4梁天恺,苏新铎,黄宇恒,徐天适,张华俊,曾碧.智能化表格识别技术综述[J].计算机工程与应用,2023,59(12):62-76.
5乔梁,李再升,程战战,李玺.SCID:用于富含视觉信息文档图像中信息提取任务的扫描中文票据数据集[J].中国图象图形学报,2023,28(8):2298-2313. 被引量：2
6秦海,李艺杰,梁桥康,王耀南.针对文档图像的非对称式几何校正网络[J].中国图象图形学报,2023,28(8):2314-2329. 被引量：2
7杨陈慧,周小亮,张恒,孙政,业宁.基于Multi-WHFPN与SimAM注意力机制的版面分割[J].电子测量技术,2024,47(1):159-168.

二级引证文献6

1张云.医院内部控制下财务票据管理优化对策[J].投资与创业,2024,35(13):68-70.
2员娇娇,胡永利,尹宝才.一种基于文本和图像的多模态目标检测方法[J].中国传媒大学学报（自然科学版）,2023,30(3):41-49.
3袁琨鹏,米金鹏,陈智谦.基于模态预融合的三维指称表达理解[J].计算机应用研究,2023,40(12):3666-3671.
4潘媛,梁国迪,邵馨叶,李芹.基于图文多模态融合的文档片段语义相似度判定算法[J].电子设计工程,2024,32(3):106-109.
5蒋存波,李昕烨,金红,丁俊良.注塑件机器视觉缺陷检测的几何矫正方法研究[J].电子测量技术,2024,47(4):127-135.
6王维兰,胡金水,魏宏喜,库尔班·吾布力,邵文苑,毕晓君,贺建军,李振江,丁凯,金连文,高良才.少数民族文字文本分析与识别的研究进展[J].中国图象图形学报,2024,29(6):1685-1713.

1李家辰,张一凡,旷远有,张雪海,沈沂亭.中文期刊论文数据采集技术研究[J].电脑知识与技术,2019,15(12Z):188-189.
2刘成林.文档图像识别技术回顾与展望[J].数据与计算发展前沿,2019,1(2):17-25. 被引量：6
3逯瑜娇,方建军,张姗,刘彩霞.基于支持向量机的版面分割问题研究[J].现代电子技术,2020,43(2):149-153. 被引量：2
4税军峰,任婧宇,彭守璋,展小云.1901-2014年黄土高原1 km分辨率月均气温和月降水量数据集[J].中国科学数据（中英文网络版）,2019,4(4):129-138. 被引量：4
5莫爵贤,王宇.基于图像处理技术的标志字符识别检测系统[J].机械制造,2020,58(4):87-90. 被引量：3
6邓科,严利雄,陈理,杨绍华,李挺.基于神经网络的交换机智能故障定位技术研究[J].电子设计工程,2020,28(5):143-147. 被引量：3
7谢劭峰,李国弘,周志浩,赵云,张伟.广西非气象参数Tm模型研究[J].大地测量与地球动力学,2020,40(4):386-390. 被引量：1
8段志伟,兰时勇,赵启军.一种基于改进区域建议网络的目标检测方法[J].计算机应用与软件,2020,37(5):189-193. 被引量：4
9高雪峰,景文博,李世涛,钱思羽,李梦迪.基于误差补偿的双线性插值全景图径向展开方法[J].长春理工大学学报（自然科学版）,2020,43(1):27-31. 被引量：2
10王健,罗隆福,邹津海,朱胜蓝,叶威.基于图像识别的高铁接触网紧固件开口销故障分类方法[J].电气化铁道,2020,31(2):45-49. 被引量：2

中国图象图形学报

2020年第2期

浏览历史

内容加载中请稍等...

多特征融合的文档图像版面分析被引量：7

同被引文献45

引证文献7

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

多特征融合的文档图像版面分析 被引量：7

同被引文献45

引证文献7

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

多特征融合的文档图像版面分析被引量：7