期刊文献+

基于链接关系预测的弯曲密集型商品文本检测

Text detection of curved and dense products based on link relationship prediction
下载PDF
导出
摘要 针对商品包装文本检测任务中弯曲密集型文本导致的错检、漏检问题,提出了一种由2个子网络组成的基于链接关系预测的文本检测框架(text detection network based on relational prediction,RPTNet)。在文本组件检测网络中,下采样采用卷积神经网络和自注意力并行的双分支结构提取局部和全局特征,并加入空洞特征增强模块(DFM)减少深层特征图在降维过程中信息的丢失;上采样采用特征金字塔与多级注意力融合模块(MAFM)相结合的方式进行多级特征融合以增强文本特征间的潜在联系,通过文本检测器从上采样输出的特征图中检测文本组件;在链接关系预测网络中,采用基于图卷积网络的关系推理框架预测文本组件间的深层相似度,采用双向长短时记忆网络将文本组件聚合为文本实例。为验证RRNet的检测性能,构建了一个由商品包装图片组成的文本检测数据集(text detection dataset composed of commodity packaging,CPTD1500)。实验结果表明:RPTNet不仅在公开文本数据集CTW-1500和Total-Text上取得了优异的性能,而且在CPTD1500数据集上的召回率和F值分别达到了85.4%和87.5%,均优于当前主流算法。 A detection framework consisting of two sub-networks,text detection network based on relational prediction(RPTNet)is proposed to solve the problem of error detection caused by curved and dense texts in the text detection task of commodity packaging images.In the text component detection network,local and global features are extracted using a parallel downsampling structure of convolutional neural network and self-attention.A dilated feature enhancement module(DFM)is added to the downsampling structure to reduce the information loss of the deep feature maps.The feature pyramid network is combined with the multi-level attention fusion module(MAFM)in upsampling structure to enhance the connections between different features and the text detector detects the text components from the upsampled feature maps.In the link relational prediction network,a relational reasoning framework based on graph convolutional network is used to predict the deep similarity between the text component and its neighbors,and a bi-directional long short-term memory network is used to aggregate the text components into text instances.In order to verify the detection performance of RPTNet,a text detection dataset CPTD1500 composed of commodity packaging images is constructed.The test results show that the effectiveness of the proposed RPTNet is verified by two publicly available text datasets,CTW-1500 and Total-Text.And the recall and F value of RPTNet on CPTD1500 are 85.4%and 87.5%,respectively,which are superior to current mainstream algorithms.
作者 耿磊 李嘉琛 刘彦北 李月龙 李晓捷 GENG Lei;LI Jiachen;LIU Yanbei;LI Yuelong;LI Xiaojie(School of Life Sciences,Tiangong University,Tianjin 300387,China;Tianjin Key Laboratory of Optoelectronic De-tection Technology,Tiangong University,Tianjin 300387,China;School of Electronics and Information Engineering,Tiangong University,Tianjin 300387,China;School of Computer Science and Technology,Tiangong University,Tianjin 300387,China)
出处 《天津工业大学学报》 CAS 北大核心 2024年第4期50-59,74,共11页 Journal of Tiangong University
基金 国家自然科学基金资助项目(61771340) 天津市科技计划资助项目(20YDTPJC00110)。
关键词 文本检测 卷积神经网络 自注意力 特征融合 图卷积网络 双向长短时记忆网络 text detection convolutional neural network self-attention feature fusion graph convolutional network bi-directional long short-term memory network
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部