期刊文献+

面部动作单元检测方法进展与挑战 被引量:2

Progress and challenges in facial action unit detection
原文传递
导出
摘要 人脸动作编码系统从人脸解剖学的角度定义了一组面部动作单元(action unit,AU),用于精确刻画人脸表情变化。每个面部动作单元描述了一组脸部肌肉运动产生的表观变化,其组合可以表达任意人脸表情。AU检测问题属于多标签分类问题,其挑战在于标注数据不足、头部姿态干扰、个体差异和不同AU的类别不均衡等。为总结近年来AU检测技术的发展,本文系统概述了2016年以来的代表性方法,根据输入数据的模态分为基于静态图像、基于动态视频以及基于其他模态的AU检测方法,并讨论在不同模态数据下为了降低数据依赖问题而引入的弱监督AU检测方法。针对静态图像,进一步介绍基于局部特征学习、AU关系建模、多任务学习以及弱监督学习的AU检测方法。针对动态视频,主要介绍基于时序特征和自监督AU特征学习的AU检测方法。最后,本文对比并总结了各代表性方法的优缺点,并在此基础上总结和讨论了面部AU检测所面临的挑战和未来发展趋势。 The anatomically based facial action coding system defines a unique set of atomic nonoverlapping facial muscle actions called action units(AUs),which can accurately characterize facial expression.AUs correspond to muscular activities that produce momentary changes in facial appearance.Combinations of AUs can represent any facial expression.As a multilabel classification problem,AU detection suffers from insufficient AU annotations,various head poses,individual differences,and imbalance among different AUs.This article systematically summarizes representative methods that have been proposed since 2016 to facilitate the development of AU detection methods.According to different input data,AU detection methods are categorized on the basis of images,videos,and other modalities.We also discuss how AU detection methods can deal with partial supervision given the large scale of unlabeled data.Image-based methods,including approaches that learn local facial representations,exploit AU relations and utilize multitask and weakly supervised learning methods.Handcrafted or automatically learned local facial representations can represent local deformations caused by active AUs.However,the former is incapable of representing different AUs with adaptive local regions while the latter suffers from insufficient training data.Approaches that exploit AU relations can utilize prior knowledge that some AUs appear together orexclusively at the same time.Such methods adopt either Bayesian or graph neural networks to model manually inferred AU relations from annotations of specified datasets.However,these inflexible methods fail to perform cross dataset evaluation.Multitask AU detection methods are inspired by the phenomena that facial shapes represented by facial landmarks are helpful in AU detection and facial deformations caused by active AUs affect the location distribution of landmarks.Except for detecting facial AUs,such methods typically estimate facial landmarks or recognize facial expressions in a multitask manner.Other tasks of facial emotion analysis,such as emotional dimension estimation,can be incorporated in the multitask learning setting.Video-based methods are categorized into strategies that rely on temporal representation and self-supervised learning.Temporal representation learning methods commonly adopt long short-term memory(LSTM)or 3 D convolutional neural networks(3 D-CNNs)to model the temporal information.Other temporal representation approaches utilize optical flow between frames to detect facial AUs.Several self-supervised approaches have recently exploited the prior knowledge that facial actions,which are movements of facial muscles and between facial frames,can be used as the self-supervisory signal.Such video-based weakly supervised AU detection methods are reasonable and explainable and can effectively alleviate the problem of insufficient AU annotations.However,these methods rely on massive amounts of unlabeled video data in the training phase and fail to perform AU detection in an end-to-end manner.We also review methods that exploit point cloud or thermal images for AU detection and are capable of alleviating the influence of head pose or illumination.Finally,we compare representative methods and analyze their advantages and drawbacks.The analysis summarizes and discusses challenges and potential directions of AU detection.We conclude that methods capable of utilizing weakly annotated or unlabeled data are important research directions for future investigations.Such methods should be carefully designed according to the prior knowledge of AUs to alleviate the demand for large amounts of labeled data.
作者 李勇 曾加贝 刘昕 山世光 Li Yong;Zeng Jiabei;Liu Xin;Shan Shiguang(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China;Beijing Seetatech Technology Co.,Ltd.,Beijing 100080,China)
出处 《中国图象图形学报》 CSCD 北大核心 2020年第11期2293-2305,共13页 Journal of Image and Graphics
基金 国家重点研发计划项目(2017YFA0700804) 国家自然科学基金项目(61702481)。
关键词 面部动作单元(AU) 静态图像面部动作单元检测 动态视频面部动作单元检测 弱监督学习 标注数据不足 facial action unit(AU) image-based AU detection video-based AU detection weakly-supervised learning insufficient annotations
  • 相关文献

同被引文献11

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部