摘要
针对传统检测方法中摄像头视角受限问题,提出了一种结合面部姿态矫正和改进ViViT的多视角下人脸疲倦检测方法。采用Mediapipe Face Mesh定位面部三维特征点并将其矫正为正面,利用提出的FGR-ViViT模型来捕捉矫正后的眼睛、眉毛、嘴巴线条图像帧序列变化。FGR-ViViT通过在ViViT的Temporal Transformer Encoder中添加部件选择模块来捕捉特征在时间维度中的细微差异,同时融合2次dropout和改进的对比损失函数来调整样本的相似性,降低模型过拟合风险并提高泛化能力。实验结果表明,提出的方法在YawDD和DROZY矫正后的线条图像帧的测试集上,F1-分数达到了94.5%和97.6%,相较于原始人脸图像帧分别提高了3.2%和10.4%,其FGR-ViViT相较于原始ViViT分别提高了6.1%和0.7%。所提方法适用于摄像头灵活摆放的多种应用场景,对解决多视角人脸睡意判断具有积极意义。
To address the view limitations in traditional detection methods,this paper proposes a multi-view facial fatigue detection method combining facial pose correction and improved ViViT.First,the 3D feature points of the face are localized and corrected to the frontal face by using Mediapipe Face Mesh,and then the proposed FGR-ViViT model is employed to capture the changes of the corrected eyes,eyebrows and mouth line image frame sequence.FGR-ViViT captures the subtle differences of features in the time dimension by adding a part selection module to the Temporal Transformer Encoder of ViViT.Meanwhile,double dropout and an improved contrast loss function are fused to adjust the similarity of samples,reduce the risk of model over-fitting and improve the generalization ability.Our experimental results show the proposed method achieves F1-scores of 94.5% and 97.6% on the YawDD and DROZY test sets,and the use of corrected line image frames is 3.2% and 10.4% higher than the original face image frames,and the FGR-ViViT is 6.1% and 0.7% higher than the original ViViT.The proposed method is applicable in scenarios of flexible camera placement,exerting positive impacts on achieving multi-view facial fatigue judgment.
作者
傅由甲
孟雪莹
FU Youjia;MENG Xueying(College of Computer Science and Engineering,Chongqing University of Technology,Chongqing 400054,China)
出处
《重庆理工大学学报(自然科学)》
CAS
北大核心
2024年第6期172-179,共8页
Journal of Chongqing University of Technology:Natural Science
基金
重庆市基础研究与前沿探索专项(重庆市自然科学基金)项目(CSTB2022NSCQ-MSX0786)
重庆市教委人文社会科学项目(23SKGH252)。