期刊文献+

DPENet:轻量化文档姿态估计网络 被引量:1

DPENet:Lightweight Document Pose Estimation Network
下载PDF
导出
摘要 现有的用于矫正透视倾斜变形文档的深度学习模型存在空间泛化性差、模型参数量大、推理速度慢等问题。从姿态估计的角度出发,提出一种轻量化文档姿态估计网络DPENet(lightweight document pose estimation network),以优化上述问题。将文档图像中的单一文档视为一个姿态估计对象,将文档的四个角点视为文档对象的四个姿态估计点,采用兼具全连接回归与高斯热图回归优点的DSNT(differentiable spatial to numerical transform)模块实现文档图像角点的高精度定位,并通过透视变换处理实现透视变形文档图像的高精度矫正。DPENet采用轻量化设计,以面向移动端的MobileNetV2为主干网络,模型体量只有10.6 MB。在SmartDoc-QA(仅取148张文档图像)数据集上与现有的三种主流网络进行了对比实验,实验结果表明,DPENet的矫正成功率(96.6%)和平均位移误差(mean displacement error,MDE)(1.28个像素)均优于其他三种网络,同时其平均矫正速度也有良好的表现。在保持轻量化和速度快的条件下,DPENet网络具有更高的变形文档矫正成功率和矫正精度。 Existing deep learning models for perspective skewed deformation document correction processing have prob-lems of large number of model parameters,slow inference speed and poor spatial generalization.This paper introduces a pose estimation algorithm and proposes a lightweight document pose estimation network(DPENet)to cover the weak-ness.The model treats a single document in a document image as a pose estimation object,and treats the four corner points of a document as four pose estimation points of the document object,and uses DSNT(differentiable spatial to numer-ical transform)to predict coordinates of four pose estimation points,which has advantages of both full connection regres-sion and heatmap regression,and achieves high-precision localization of document images corner points,and implements high-precision correction of the perspective deformed document image by perspective transformation processing.DPENet adopts lightweight design which uses MobileNet V2 as the backbone network,so that DPENet has a small volume which is only 10.6 MB.Compared with three mainstream networks on SmartDoc-QA(148 images),the correction success rate(96.6%)and the mean displacement error(MDE)(1.28 pixels)of DPENet are better than the other three networks,while its average correction speed also has good performance.The DPENet has higher correction success rate and correction accuracy for deformed documents while maintaining light weight and fast speed.
作者 韩晶 吕学强 张祥祥 郝伟 张凯 HAN Jing;LYU Xueqiang;ZHANG Xiangxiang;HAO Wei;ZHANG Kai(Beijing Key Laboratory of Internet Culture and Digital Dissemination,Beijing Information Science and Technology University,Beijing 100101,China;Research Center for Language Intelligence of China,Capital Normal University,Beijing 100048,China)
出处 《计算机工程与应用》 CSCD 北大核心 2022年第22期210-218,共9页 Computer Engineering and Applications
基金 国家重点研发计划(2017YFC0805006) 北京市自然科学基金(4212020) 北京市教委科研计划(KM202111232001) 北京信息科技大学网络文化与数字传播北京市重点实验室开放课题(20220010001)。
关键词 姿态估计 深度学习 文档图像矫正 轻量化网络 MobileNetV2 pose estimation deep learning document image rectification lightweight network MobileNet V2
  • 相关文献

参考文献3

二级参考文献10

共引文献33

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部