摘要
近年来,眼睛凝视估计引起广泛关注。基于RGB外观的凝视估计方法使用普通摄像机和深度学习来进行凝视估计,避免了像商用眼动仪一样使用昂贵的红外设备,为更准确和成本更低的眼睛凝视估计提供了可能。然而,RGB外观图像中包含如光照强度、肤色等多种与凝视无关的特征,这些无关特征会在深度学习回归的过程中产生干扰,进而影响凝视估计的精度。针对以上问题,提出了一种名为类注意力网络(CA-Net)的新架构,它包含通道、尺度、眼睛3种不同的类注意力模块,通过这些类注意力模块可以提取和融合不同种类的注意力编码,从而降低与凝视无关特征所占的权重。在GazeCapture数据集上的大量实验表明,在基于RGB外观的凝视估计方法中,相比现有的最先进方法,CA-Net在手机和平板上分别能够提高约0.6%和7.4%的凝视估计精度。
In recent years,eye gaze estimation has attracted widespread attention.The gaze estimation method based on RGB appearance uses ordinary cameras and deep learning for gaze estimation,avoiding the use of expensive infrared devices like commercial eye trackers,providing the possibility for more accurate and cost-effective eye gaze estimation.However,due to the presence of various features unrelated to gaze,such as lighting intensity and skin color,in RGB appearance images,these irrelevant features can cause interference in the deep learning regression process,thereby affecting the accuracy of gaze estimation.In response to the above issues,this paper proposes a new architecture called class attention network(CA-Net),which includes three different class attention modules:channel,scale,and eye.Through these class attention modules,different types of attention encoding can be extracted and fused,thereby reducing the weight of gaze independent features.Extensive experiments on the GazeCapture dataset show that,compared to the state-of-the-art method,CA-Net can improve gaze estimation accuracy by approximately 0.6%and 7.4%on mobile phones and tablets,respectively,in RGB based gaze estimation methods.
作者
徐金龙
董明瑞
李颖颖
刘艳青
韩林
XU Jinlong;DONG Mingrui;LI Yingying;LIU Yanqing;HAN Lin(National Supercomputing Center in Zhengzhou,Zhengzhou 450000,China;School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450000,China;Information Engineering University,Zhengzhou 450000,China)
出处
《计算机科学》
CSCD
北大核心
2024年第10期295-301,共7页
Computer Science
基金
2022年河南省重大科技专项(221100210600)
22求是科研启动(自)(32213247)
2023年度河南省科技攻关专项(232102210185)。
关键词
类注意力
轻压缩激励
自注意力
多尺度
眼睛凝视估计
Class attention
Light squeeze-and-excitation
Self-attention
Multiscale
Eye gaze estimation