Gaze estimation is one of the most promising technologies for supporting indoor monitoring and interaction systems.However,previous gaze estimation techniques generally work only in a controlled laboratory environment...Gaze estimation is one of the most promising technologies for supporting indoor monitoring and interaction systems.However,previous gaze estimation techniques generally work only in a controlled laboratory environment because they require a number of high-resolution eye images.This makes them unsuitable for welfare and healthcare facilities with the following challenging characteristics:1)users’continuous movements,2)various lighting conditions,and 3)a limited amount of available data.To address these issues,we introduce a multi-view multi-modal head-gaze estimation system that translates the user’s head orientation into the gaze direction.The proposed system captures the user using multiple cameras with depth and infrared modalities to train more robust gaze estimators under the aforementioned conditions.To this end,we implemented a deep learning pipeline that can handle different types and combinations of data.The proposed system was evaluated using the data collected from 10 volunteer participants to analyze how the use of single/multiple cameras and modalities affect the performance of head-gaze estimators.Through various experiments,we found that 1)an infrared-modality provides more useful features than a depth-modality,2)multi-view multi-modal approaches provide better accuracy than singleview single-modal approaches,and 3)the proposed estimators achieve a high inference efficiency that can be used in real-time applications.展开更多
基金This work was supported by the Basic Research Program through the National Research Foundation of Korea(NRF)grant funded by the Korea Government(MSIT)under Grant 2019R1F1A1045329 and Grant 2020R1A4A1017775.
文摘Gaze estimation is one of the most promising technologies for supporting indoor monitoring and interaction systems.However,previous gaze estimation techniques generally work only in a controlled laboratory environment because they require a number of high-resolution eye images.This makes them unsuitable for welfare and healthcare facilities with the following challenging characteristics:1)users’continuous movements,2)various lighting conditions,and 3)a limited amount of available data.To address these issues,we introduce a multi-view multi-modal head-gaze estimation system that translates the user’s head orientation into the gaze direction.The proposed system captures the user using multiple cameras with depth and infrared modalities to train more robust gaze estimators under the aforementioned conditions.To this end,we implemented a deep learning pipeline that can handle different types and combinations of data.The proposed system was evaluated using the data collected from 10 volunteer participants to analyze how the use of single/multiple cameras and modalities affect the performance of head-gaze estimators.Through various experiments,we found that 1)an infrared-modality provides more useful features than a depth-modality,2)multi-view multi-modal approaches provide better accuracy than singleview single-modal approaches,and 3)the proposed estimators achieve a high inference efficiency that can be used in real-time applications.