Recognizing irregular text in natural images is a challenging task in computer vision.The existing approaches still face difficulties in recognizing irre-gular text because of its diverse shapes.In this paper,we propos...Recognizing irregular text in natural images is a challenging task in computer vision.The existing approaches still face difficulties in recognizing irre-gular text because of its diverse shapes.In this paper,we propose a simple yet powerful irregular text recognition framework based on an encoder-decoder archi-tecture.The proposed framework is divided into four main modules.Firstly,in the image transformation module,a Thin Plate Spline(TPS)transformation is employed to transform the irregular text image into a readable text image.Sec-ondly,we propose a novel Spatial Attention Module(SAM)to compel the model to concentrate on text regions and obtain enriched feature maps.Thirdly,a deep bi-directional long short-term memory(Bi-LSTM)network is used to make a con-textual feature map out of a visual feature map generated from a Convolutional Neural Network(CNN).Finally,we propose a Dual Step Attention Mechanism(DSAM)integrated with the Connectionist Temporal Classification(CTC)-Attention decoder to re-weights visual features and focus on the intra-sequence relationships to generate a more accurate character sequence.The effectiveness of our proposed framework is verified through extensive experiments on various benchmarks datasets,such as SVT,ICDAR,CUTE80,and IIIT5k.The perfor-mance of the proposed text recognition framework is analyzed with the accuracy metric.Demonstrate that our proposed method outperforms the existing approaches on both regular and irregular text.Additionally,the robustness of our approach is evaluated using the grocery datasets,such as GroZi-120,Web-Market,SKU-110K,and Freiburg Groceries datasets that contain complex text images.Still,our framework produces superior performance on grocery datasets.展开更多
现有跨域人脸活体检测算法,其特征提取过程容易发生过拟合和缺乏特征聚合所导致的泛化性不足问题。针对该问题,提出了跨域人脸活体检测的单边对抗网络算法,将分组卷积与改进的倒残差结构融合替换普通卷积,降低网络参数同时加强人脸细粒...现有跨域人脸活体检测算法,其特征提取过程容易发生过拟合和缺乏特征聚合所导致的泛化性不足问题。针对该问题,提出了跨域人脸活体检测的单边对抗网络算法,将分组卷积与改进的倒残差结构融合替换普通卷积,降低网络参数同时加强人脸细粒度特征的表达能力,并引入自适应特征归一化模块,强调图像中人脸活体信息区域淡化无关背景区域,有效避免人脸活体信息的过拟合并加强来自不同源域的人脸活体检测能力。基于NetVLAD引入通道注意力机制模块,通道注意力机制模块作为特征聚合网络的分支,学习不同源域中人脸局部特征的语义信息,有效增强对不同源域的人脸活体信息分类的泛化能力。设计两模块融合网络以提高未知场景下跨域人脸活体检测精度。在OULU-NPU、CASIA-FASD、MSU-MFSD和Idiap Replay-Attack数据集上的实验结果表明,该算法在跨数据集测试O&C&M to I、O&C&I to M、I&C&M to O、O&M&I to C均有不错的表现,其中,在O&C&I to M及O&M&I to C性能评估指标分别提升了0.99个百分点和0.5个百分点的精度。展开更多
文摘Recognizing irregular text in natural images is a challenging task in computer vision.The existing approaches still face difficulties in recognizing irre-gular text because of its diverse shapes.In this paper,we propose a simple yet powerful irregular text recognition framework based on an encoder-decoder archi-tecture.The proposed framework is divided into four main modules.Firstly,in the image transformation module,a Thin Plate Spline(TPS)transformation is employed to transform the irregular text image into a readable text image.Sec-ondly,we propose a novel Spatial Attention Module(SAM)to compel the model to concentrate on text regions and obtain enriched feature maps.Thirdly,a deep bi-directional long short-term memory(Bi-LSTM)network is used to make a con-textual feature map out of a visual feature map generated from a Convolutional Neural Network(CNN).Finally,we propose a Dual Step Attention Mechanism(DSAM)integrated with the Connectionist Temporal Classification(CTC)-Attention decoder to re-weights visual features and focus on the intra-sequence relationships to generate a more accurate character sequence.The effectiveness of our proposed framework is verified through extensive experiments on various benchmarks datasets,such as SVT,ICDAR,CUTE80,and IIIT5k.The perfor-mance of the proposed text recognition framework is analyzed with the accuracy metric.Demonstrate that our proposed method outperforms the existing approaches on both regular and irregular text.Additionally,the robustness of our approach is evaluated using the grocery datasets,such as GroZi-120,Web-Market,SKU-110K,and Freiburg Groceries datasets that contain complex text images.Still,our framework produces superior performance on grocery datasets.
文摘现有跨域人脸活体检测算法,其特征提取过程容易发生过拟合和缺乏特征聚合所导致的泛化性不足问题。针对该问题,提出了跨域人脸活体检测的单边对抗网络算法,将分组卷积与改进的倒残差结构融合替换普通卷积,降低网络参数同时加强人脸细粒度特征的表达能力,并引入自适应特征归一化模块,强调图像中人脸活体信息区域淡化无关背景区域,有效避免人脸活体信息的过拟合并加强来自不同源域的人脸活体检测能力。基于NetVLAD引入通道注意力机制模块,通道注意力机制模块作为特征聚合网络的分支,学习不同源域中人脸局部特征的语义信息,有效增强对不同源域的人脸活体信息分类的泛化能力。设计两模块融合网络以提高未知场景下跨域人脸活体检测精度。在OULU-NPU、CASIA-FASD、MSU-MFSD和Idiap Replay-Attack数据集上的实验结果表明,该算法在跨数据集测试O&C&M to I、O&C&I to M、I&C&M to O、O&M&I to C均有不错的表现,其中,在O&C&I to M及O&M&I to C性能评估指标分别提升了0.99个百分点和0.5个百分点的精度。