基于渐进式学习与多尺度增强的客体视觉注意力估计方法

Objective Visual Attention Estimation Method via Progressive Learning and Multi-scale Enhancement

下载PDF

导出

摘要视觉注意力机制已引起学界和产业界的广泛关注,但既有工作主要从场景观察者的视角进行注意力检测。然而,现实中不断涌现的智能应用场景需要从客体视角进行视觉注意力检测。例如,检测监控目标的视觉注意力有助于预测其后续行为,智能机器人需要理解交互对象的意图才能有效互动。该文结合客体视觉注意力的认知机制,提出一种基于渐进式学习与多尺度增强的客体视觉注意力估计方法。该方法把客体视域视为几何结构和几何细节的组合,构建层次自注意力模块(HSAM)获取深层特征之间的长距离依赖关系,适应几何特征的多样性;并利用方向向量和视域生成器得到注视点的概率分布,构建特征融合模块将多分辨率特征进行结构共享、融合与增强,更好地获取空间上下文特征;最后构建综合损失函数来估计注视方向、视域和焦点预测的相关性。实验结果表明,该文所提方法在公开数据集和自建数据集上对客体视觉注意力估计的不同精度评价指标都优于目前的主流方法。 Understanding the attention mechanism of the human visual system has attracted much research attention from researchers and industries.Recent studies of attention mechanisms focus mainly on observer patterns.However,more intelligent applications are presented in the real world and require objective visual attention detection.Automating tasks such as surveillance or human-robot collaboration require anticipating and predicting the behavior of objects.In such contexts,gaze and focus can be highly informative about participants'intentions,goals,and upcoming decisions.Here,a progressive mechanism of objective visual attention is developed by combining cognitive mechanisms.The field is first viewed as a combination of geometric structure and geometric details.A Hierarchical Self-Attention Module(HSAM)is constructed to capture the long-distance dependencies between deep features and adapt geometric feature diversity.With the identified generators,the field of view direction vectors are generated,and the probability distribution of gaze points is obtained.Furthermore,a feature fusion module is designed for structure sharing,fusion,and enhancement of multi-resolution features.Its output contains more detailed spatial and global information,better obtaining spatial context features.The experimental results are in excellent agreement with theoretical predictions by different evaluation metrics for objective attention estimation on publicly available and self-built datasets.

作者丰江帆何中鱼 FENG Jiangfan;HE Zhongyu(School of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China)

机构地区重庆邮电大学计算机科学与技术学院

出处《电子与信息学报》 EI CSCD 北大核心 2023年第4期1475-1484,共10页 Journal of Electronics & Information Technology

基金国家自然科学基金(41971365) 重庆市自然科学基金(cstc2020jcyj-msxmX0635)。

关键词客体视觉注意力渐进式学习层次自注意力特征融合 Objective visual attention Progressive learning Hierarchical self-attention Feature fusion

分类号 TN911.73 [电子电信—通信与信息系统] TP391 [自动化与计算机技术—计算机应用技术]