场景文本识别(Scene Text Recognition, STR)使计算机能够获取自然场景图像中的文本信息。在STR的研究中识别准确性始终是关注重点。对于计算资源受限的边缘设备,模型的参数量和计算效率也同样重要。针对该问题,提出了基于多头注意力融...场景文本识别(Scene Text Recognition, STR)使计算机能够获取自然场景图像中的文本信息。在STR的研究中识别准确性始终是关注重点。对于计算资源受限的边缘设备,模型的参数量和计算效率也同样重要。针对该问题,提出了基于多头注意力融合的自然场景文本识别(Scene Text Recognition Based on Multi-Head Attention Fusion, MAF)算法。通过利用多头注意力(Multi-Head Attention, MHA)机制设计了视觉编码器,实现对规则和不规则场景文本图像的视觉特征深度提取。为了增强模型对字符间间距变化和语义相似性的感知能力,提出了增强位置编码以及结合输入上下文和置换模型的语义编码器。通过MHA将视觉和语义特征信息融合,提高在复杂环境背景下的文本字符识别准确率。实验结果表明,MAF的参数量仅为7.6×10^(6),FLOPS为1.0×10^(9),在真实STR数据集上的平均识别准确率达到95.6%,有效平衡了识别任务的准确性和计算效率,具有一定的应用潜力。展开更多
The current academic research on the Dian Shi Zhai Pictorial founded in 1884 has been perfected,but there is very little research on the Flying Shadow Pavilion Pictorial,which can be said to be derived from Dian Shi Z...The current academic research on the Dian Shi Zhai Pictorial founded in 1884 has been perfected,but there is very little research on the Flying Shadow Pavilion Pictorial,which can be said to be derived from Dian Shi Zhai Pictorial terms of both content and form,and which was founded by the Haiist painter Wu Youru in 1890 in order to reward the increasing number of admirers,but it is still in the initial stage.Flying Shadow Pavilion Pictorial consisted of four parts:pictures of ladies in Shanghai costumes,news about current affairs,pictures of animals,and compilations of women,each of which was accompanied by the then popular notebooks,which were popular at the time among the then readers because of the matching of pictures with text and the mixing of narratives and discussions.This paper takes Flying Shadow Pavilion Pictorial as the main object,summarizes the existing literature about Flying Shadow Pavilion Pictorial,points out the deficiencies of the current research on this basis,and points out the development trend of Flying Shadow Pavilion Pictorial in the future research.展开更多
文摘场景文本识别(Scene Text Recognition, STR)使计算机能够获取自然场景图像中的文本信息。在STR的研究中识别准确性始终是关注重点。对于计算资源受限的边缘设备,模型的参数量和计算效率也同样重要。针对该问题,提出了基于多头注意力融合的自然场景文本识别(Scene Text Recognition Based on Multi-Head Attention Fusion, MAF)算法。通过利用多头注意力(Multi-Head Attention, MHA)机制设计了视觉编码器,实现对规则和不规则场景文本图像的视觉特征深度提取。为了增强模型对字符间间距变化和语义相似性的感知能力,提出了增强位置编码以及结合输入上下文和置换模型的语义编码器。通过MHA将视觉和语义特征信息融合,提高在复杂环境背景下的文本字符识别准确率。实验结果表明,MAF的参数量仅为7.6×10^(6),FLOPS为1.0×10^(9),在真实STR数据集上的平均识别准确率达到95.6%,有效平衡了识别任务的准确性和计算效率,具有一定的应用潜力。
文摘The current academic research on the Dian Shi Zhai Pictorial founded in 1884 has been perfected,but there is very little research on the Flying Shadow Pavilion Pictorial,which can be said to be derived from Dian Shi Zhai Pictorial terms of both content and form,and which was founded by the Haiist painter Wu Youru in 1890 in order to reward the increasing number of admirers,but it is still in the initial stage.Flying Shadow Pavilion Pictorial consisted of four parts:pictures of ladies in Shanghai costumes,news about current affairs,pictures of animals,and compilations of women,each of which was accompanied by the then popular notebooks,which were popular at the time among the then readers because of the matching of pictures with text and the mixing of narratives and discussions.This paper takes Flying Shadow Pavilion Pictorial as the main object,summarizes the existing literature about Flying Shadow Pavilion Pictorial,points out the deficiencies of the current research on this basis,and points out the development trend of Flying Shadow Pavilion Pictorial in the future research.