Background:Difficulty in hearing can occur for numerous reasons across a variety of ages in humans.To overcome this,humans can employ a number of techniques to help improve their understanding of sound in other ways.O...Background:Difficulty in hearing can occur for numerous reasons across a variety of ages in humans.To overcome this,humans can employ a number of techniques to help improve their understanding of sound in other ways.One is to use vision,and attempt to lip-read in order to understand someone else in a face-to-face conversation.Audio-visual integration has a long history in perception(e.g.,the McGurk Effect),and researchers have shown that older adults will look at the mouth region for additional information in noisy situations.However,this concept has not been explored in the context of social media.A common way to communicate virtually that simulates a live conversation is the concept of video chatting or conferencing.It is used for a variety of reasons including work,maintaining social interactions,and has started to be used in clinical settings.However,video chat session quality is often sub-optimal,and may contain degraded audio and/or decoupled audio and video.The goal of this study is to determine whether humans use the same visual compensation mechanism,lip reading,in a digital setting as they would in a face-to-face conversation.Methods:The participants(n=116,age 18 to 41)answered a demographics questionnaire including questions about their use of the video chatting software.Then,the participants viewed two videos of a video call:one with synchronized audio and video,and the other dyssynchronous(1 second delay).The order of video was randomized across participants.Binocular eye movements were monitored at 60 Hz using a Mirametrix S2 eye tracker connected to Ogama 5.0(http://www.ogama.net/).After each video,the participants answered questions about the call quality,and the content of the video.Results:There was no significant difference in the total dwell time at the eyes and the mouth of the speaker remained,t(116)=−1.574,P=0.059,d=−0.147,BF10=0.643.However,using the heat maps generated by Ogama,we observed when viewing the poor-quality video,the participants looked more towards the mouth than the eyes of the speaker.It was found that as call quality decreased,the number of fixations increased from n=79.87 in the synchronous condition to n=113.4 in the asynchronous condition,and the median duration of each fixation decreased from 218.3 ms in the synchronous condition to 205ms in the asynchronous condition.Conclusions:The above results may indicate that humans employ similar compensation mechanisms in response to a decrease in auditory comprehension,given the tendency of participants looking towards the mouth of the speaker more.However,more study is needed because of the inconsistency in the results.展开更多
文摘Background:Difficulty in hearing can occur for numerous reasons across a variety of ages in humans.To overcome this,humans can employ a number of techniques to help improve their understanding of sound in other ways.One is to use vision,and attempt to lip-read in order to understand someone else in a face-to-face conversation.Audio-visual integration has a long history in perception(e.g.,the McGurk Effect),and researchers have shown that older adults will look at the mouth region for additional information in noisy situations.However,this concept has not been explored in the context of social media.A common way to communicate virtually that simulates a live conversation is the concept of video chatting or conferencing.It is used for a variety of reasons including work,maintaining social interactions,and has started to be used in clinical settings.However,video chat session quality is often sub-optimal,and may contain degraded audio and/or decoupled audio and video.The goal of this study is to determine whether humans use the same visual compensation mechanism,lip reading,in a digital setting as they would in a face-to-face conversation.Methods:The participants(n=116,age 18 to 41)answered a demographics questionnaire including questions about their use of the video chatting software.Then,the participants viewed two videos of a video call:one with synchronized audio and video,and the other dyssynchronous(1 second delay).The order of video was randomized across participants.Binocular eye movements were monitored at 60 Hz using a Mirametrix S2 eye tracker connected to Ogama 5.0(http://www.ogama.net/).After each video,the participants answered questions about the call quality,and the content of the video.Results:There was no significant difference in the total dwell time at the eyes and the mouth of the speaker remained,t(116)=−1.574,P=0.059,d=−0.147,BF10=0.643.However,using the heat maps generated by Ogama,we observed when viewing the poor-quality video,the participants looked more towards the mouth than the eyes of the speaker.It was found that as call quality decreased,the number of fixations increased from n=79.87 in the synchronous condition to n=113.4 in the asynchronous condition,and the median duration of each fixation decreased from 218.3 ms in the synchronous condition to 205ms in the asynchronous condition.Conclusions:The above results may indicate that humans employ similar compensation mechanisms in response to a decrease in auditory comprehension,given the tendency of participants looking towards the mouth of the speaker more.However,more study is needed because of the inconsistency in the results.