摘要
针对现有表格结构检测方法运算量大,表格结构识别准确率低的问题,提出了一种改进的表格结构识别方法。该方法优化了结构与位置对齐网络,提出在一个轻量级的CPU卷积神经网络PPLCNet较深层增加残差连接,加强网络的学习能力;在特征提取和特征融合之间引入卷积块注意力模块(convolutional block attention module,CBAM)机制,同时从通道和空间维度加强模型对目标对象的定位能力;在Head部分采用卷积层替代全连接层,实现权重共享,用来降低模型的计算量;此外,还采用Smooth L1损失函数,通过回归表格四顶点坐标,避免图像畸变对于模型性能的影响;为了验证算法的性能,采用PubTabNet数据集进行测试,结果表明所提方法的准确率(Acc)达到71.58%,基于树编辑距离的相似度(tree-editdistance-based similarity,TEDS)达到94.47%;相比较于改进前模型精度提升了2.76%,TEDS提升了0.79%,模型综合性能更优。
In response to the problem of the existing table structure detection method,the accuracy of the accuracy of the table structure is low,and a improved table structure recognition method is proposed.This method optimizes the structure and position alignment network,and proposes to increase the residual connection in a lightweight CPU convolutional neural network PPLCNet to enhance the learning capabilities of the network.The introduction of convolutional block attention module(CBAM)mechanism,at the same time,the localization ability of the model to the target object is strengthened from the channel and spatial dimensions.Use a convolutional layer to replace the full connection layer in the head part to reduce the weight sharing to reduce the calculation of the model.In addition,Smooth Ll loss function is also used to avoid the impact of image distortion on model performance by regression of table four vertex coordinates.In order to verify the performance of the algorithm,the PubTabNet dataset was tested,the results showed that the accuracy(Acc)of the method reached 71.58%,and the tree-editdistance-based similarity(TEDS)reached 94.47%.Compared with the accuracy of the model before improved,the accuracy of the model was increased by 2.76%,TEDS increased by 0.79%,and the model comprehensive performance was better.
作者
陈雨
蒋三新
Chen Yu;Jiang Sanxin(College of Electronics and Information Engineering,Shanghai University of Electric Power,Shanghai 201306,China)
出处
《国外电子测量技术》
北大核心
2023年第12期57-62,共6页
Foreign Electronic Measurement Technology
关键词
深度学习
表格结构识别
注意力机制
残差网络
deep learning
table structure recognition
attention mechanisms
residual network