In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction.Previous studies have made significant achievements in predicting 2D or 3D ...In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction.Previous studies have made significant achievements in predicting 2D or 3D gazes from monocular face images.This study presents a deep neural network for 2D gaze estimation on mobile devices.It achieves state-of-the-art 2D gaze point regression error,while significantly improving gaze classification error on quadrant divisions of the display.To this end,an efficient attention-based module that correlates and fuses the left and right eye contextual features is first proposed to improve gaze point regression performance.Subsequently,through a unified perspective for gaze estimation,metric learning for gaze classification on quadrant divisions is incorporated as additional supervision.Consequently,both gaze point regression and quadrant classification perfor-mances are improved.The experiments demonstrate that the proposed method outperforms existing gaze-estima-tion methods on the GazeCapture and MPIIFaceGaze datasets.展开更多
A person’s eye gaze can effectively express that person’s intentions.Thus,gaze estimation is an important approach in intelligent manufacturing to analyze a person’s intentions.Many gaze estimation methods regress ...A person’s eye gaze can effectively express that person’s intentions.Thus,gaze estimation is an important approach in intelligent manufacturing to analyze a person’s intentions.Many gaze estimation methods regress the direction of the gaze by analyzing images of the eyes,also known as eye patches.However,it is very difficult to construct a person-independent model that can estimate an accurate gaze direction for every person due to individual differences.In this paper,we hypothesize that the difference in the appearance of each of a person’s eyes is related to the difference in the corresponding gaze directions.Based on this hypothesis,a differential eyes’appearances network(DEANet)is trained on public datasets to predict the gaze differences of pairwise eye patches belonging to the same individual.Our proposed DEANet is based on a Siamese neural network(SNNet)framework which has two identical branches.A multi-stream architecture is fed into each branch of the SNNet.Both branches of the DEANet that share the same weights extract the features of the patches;then the features are concatenated to obtain the difference of the gaze directions.Once the differential gaze model is trained,a new person’s gaze direction can be estimated when a few calibrated eye patches for that person are provided.Because personspecific calibrated eye patches are involved in the testing stage,the estimation accuracy is improved.Furthermore,the problem of requiring a large amount of data when training a person-specific model is effectively avoided.A reference grid strategy is also proposed in order to select a few references as some of the DEANet’s inputs directly based on the estimation values,further thereby improving the estimation accuracy.Experiments on public datasets show that our proposed approach outperforms the state-of-theart methods.展开更多
Gaze information is important for finding region of interest(ROI)which implies where the next action will happen.Supervised gaze estimation does not work on EPIC-Kitchens for lack of ground truth.In this paper,we deve...Gaze information is important for finding region of interest(ROI)which implies where the next action will happen.Supervised gaze estimation does not work on EPIC-Kitchens for lack of ground truth.In this paper,we develop an unsupervised gaze estimation method that helps with egocentric action anticipation.We adopt gaze map as a feature representation,and input it into a multiple modality network jointly with red-green-blue(RGB),optical flow and object features.We explore the method on EGTEA dataset.The estimated gaze map is further optimized with dilation and Gaussian filter,masked onto the original RGB frame and encoded as the important gaze modality.Our results outperform the strong baseline Rolling-Unrolling LSTMs(RULSTM),with top-5 accuracy achieving 34.31%on the seen test set(S1)and 22.07%on unseen test set(S2).The accuracy is improved by 0.58%and 0.87%,respectively.展开更多
Background Eye-tracking technology for mobile devices has made significant progress.However,owing to limited computing capacity and the complexity of context,the conventional image feature-based technology cannot extr...Background Eye-tracking technology for mobile devices has made significant progress.However,owing to limited computing capacity and the complexity of context,the conventional image feature-based technology cannot extract features accurately,thus affecting the performance.Methods This study proposes a novel approach by combining appearance-and feature-based eye-tracking methods.Face and eye region detections were conducted to obtain features that were used as inputs to the appearance model to detect the feature points.The feature points were used to generate feature vectors,such as corner center-pupil center,by which the gaze fixation coordinates were calculated.Results To obtain feature vectors with the best performance,we compared different vectors under different image resolution and illumination conditions,and the results indicated that the average gaze fixation accuracy was achieved at a visual angle of 1.93°when the image resolution was 96×48 pixels,with light sources illuminating from the front of the eye.Conclusions Compared with the current methods,our method improved the accuracy of gaze fixation and it was more usable.展开更多
基金the National Natural Science Foundation of China,No.61932003and the Fundamental Research Funds for the Central Universities.
文摘In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction.Previous studies have made significant achievements in predicting 2D or 3D gazes from monocular face images.This study presents a deep neural network for 2D gaze estimation on mobile devices.It achieves state-of-the-art 2D gaze point regression error,while significantly improving gaze classification error on quadrant divisions of the display.To this end,an efficient attention-based module that correlates and fuses the left and right eye contextual features is first proposed to improve gaze point regression performance.Subsequently,through a unified perspective for gaze estimation,metric learning for gaze classification on quadrant divisions is incorporated as additional supervision.Consequently,both gaze point regression and quadrant classification perfor-mances are improved.The experiments demonstrate that the proposed method outperforms existing gaze-estima-tion methods on the GazeCapture and MPIIFaceGaze datasets.
基金supported by the Science and Technology Support Project of Sichuan Science and Technology Department(2018SZ0357)and China Scholarship。
文摘A person’s eye gaze can effectively express that person’s intentions.Thus,gaze estimation is an important approach in intelligent manufacturing to analyze a person’s intentions.Many gaze estimation methods regress the direction of the gaze by analyzing images of the eyes,also known as eye patches.However,it is very difficult to construct a person-independent model that can estimate an accurate gaze direction for every person due to individual differences.In this paper,we hypothesize that the difference in the appearance of each of a person’s eyes is related to the difference in the corresponding gaze directions.Based on this hypothesis,a differential eyes’appearances network(DEANet)is trained on public datasets to predict the gaze differences of pairwise eye patches belonging to the same individual.Our proposed DEANet is based on a Siamese neural network(SNNet)framework which has two identical branches.A multi-stream architecture is fed into each branch of the SNNet.Both branches of the DEANet that share the same weights extract the features of the patches;then the features are concatenated to obtain the difference of the gaze directions.Once the differential gaze model is trained,a new person’s gaze direction can be estimated when a few calibrated eye patches for that person are provided.Because personspecific calibrated eye patches are involved in the testing stage,the estimation accuracy is improved.Furthermore,the problem of requiring a large amount of data when training a person-specific model is effectively avoided.A reference grid strategy is also proposed in order to select a few references as some of the DEANet’s inputs directly based on the estimation values,further thereby improving the estimation accuracy.Experiments on public datasets show that our proposed approach outperforms the state-of-theart methods.
基金Supported by the National Natural Science Foundation of China(61772328)
文摘Gaze information is important for finding region of interest(ROI)which implies where the next action will happen.Supervised gaze estimation does not work on EPIC-Kitchens for lack of ground truth.In this paper,we develop an unsupervised gaze estimation method that helps with egocentric action anticipation.We adopt gaze map as a feature representation,and input it into a multiple modality network jointly with red-green-blue(RGB),optical flow and object features.We explore the method on EGTEA dataset.The estimated gaze map is further optimized with dilation and Gaussian filter,masked onto the original RGB frame and encoded as the important gaze modality.Our results outperform the strong baseline Rolling-Unrolling LSTMs(RULSTM),with top-5 accuracy achieving 34.31%on the seen test set(S1)and 22.07%on unseen test set(S2).The accuracy is improved by 0.58%and 0.87%,respectively.
基金Supported by the National Natural Science Foundation of China (61772468, 62172368)the Fundamental Research Funds forthe Provincial Universities of Zhejiang (RF-B2019001)
文摘Background Eye-tracking technology for mobile devices has made significant progress.However,owing to limited computing capacity and the complexity of context,the conventional image feature-based technology cannot extract features accurately,thus affecting the performance.Methods This study proposes a novel approach by combining appearance-and feature-based eye-tracking methods.Face and eye region detections were conducted to obtain features that were used as inputs to the appearance model to detect the feature points.The feature points were used to generate feature vectors,such as corner center-pupil center,by which the gaze fixation coordinates were calculated.Results To obtain feature vectors with the best performance,we compared different vectors under different image resolution and illumination conditions,and the results indicated that the average gaze fixation accuracy was achieved at a visual angle of 1.93°when the image resolution was 96×48 pixels,with light sources illuminating from the front of the eye.Conclusions Compared with the current methods,our method improved the accuracy of gaze fixation and it was more usable.