In dimensional affect recognition, the machine learning methods, which are used to model and predict affect, are mostly classification and regression. However, the annotation in the dimensional affect space usually ta...In dimensional affect recognition, the machine learning methods, which are used to model and predict affect, are mostly classification and regression. However, the annotation in the dimensional affect space usually takes the form of a continuous real value which has an ordinal property. The aforementioned methods do not focus on taking advantage of this important information. Therefore, we propose an affective rating ranking framework for affect recognition based on face images in the valence and arousal dimensional space. Our approach can appropriately use the ordinal information among affective ratings which are generated by discretizing continuous annotations.Specifically, we first train a series of basic cost-sensitive binary classifiers, each of which uses all samples relabeled according to the comparison results between corresponding ratings and a given rank of a binary classifier. We obtain the final affective ratings by aggregating the outputs of binary classifiers. By comparing the experimental results with the baseline and deep learning based classification and regression methods on the benchmarking database of the AVEC 2015 Challenge and the selected subset of SEMAINE database, we find that our ordinal ranking method is effective in both arousal and valence dimensions.展开更多
目的:测量正常人群表情运动的可重复性,为患者手术等干预措施的效果评价提供参照数据。方法:征集面部结构大致对称、无面部运动及感觉神经障碍病史的志愿者共15名(男性7名,女性8名,中位年龄25岁)。使用三维动态照相机记录研究对象的面...目的:测量正常人群表情运动的可重复性,为患者手术等干预措施的效果评价提供参照数据。方法:征集面部结构大致对称、无面部运动及感觉神经障碍病史的志愿者共15名(男性7名,女性8名,中位年龄25岁)。使用三维动态照相机记录研究对象的面部表情运动(闭唇笑、露齿笑、噘嘴、鼓腮),分辨率为采集频率60帧/s,挑选每个面部表情中最有特征的6帧图像,分别为静止状态时图像(T0)、从静止状态至最大运动状态时的中间图像(T1)、刚达到最大运动状态时的图像(T2)、最大运动状态将结束时的图像(T3)、最大运动状态至静止状态时的中间图像(T4)及动作结束时的静止图像(T5)。采集两次面部表情三维图像数据,间隔1周以上。以静止图像(T0)为参照,将运动状态系列图像(T1~T5)与之进行图像配准融合,采用区域分析法量化分析前后两次同一表情相同关键帧图像与对应静止状态三维图像的三维形貌差异,以均方根(root mean square,RMS)表示。结果:闭唇笑、露齿笑以及鼓腮表情中,前后两次的对应时刻(T1~T5)图像与相应T0时刻的静止图像配准融合,计算得出的RMS值差异无统计学意义。撅嘴动作过程中,前后两次T2时刻对应面部三维图像与相应T0时刻静止图像配准融合,得出RMS值差异有统计学意义(P<0.05),其余时刻的图像差异无统计学意义。结论:正常人的面部表情具有一定的可重复性,但是噘嘴动作的可重复性较差,三维动态照相机能够量化记录及分析面部表情动作的三维特征。展开更多
基金supported by the National Natural Science Foundation of China(Nos.61272211 and 61672267)the Open Project Program of the National Laboratory of Pattern Recognition(No.201700022)+1 种基金the China Postdoctoral Science Foundation(No.2015M570413)and the Innovation Project of Undergraduate Students in Jiangsu University(No.16A235)
文摘In dimensional affect recognition, the machine learning methods, which are used to model and predict affect, are mostly classification and regression. However, the annotation in the dimensional affect space usually takes the form of a continuous real value which has an ordinal property. The aforementioned methods do not focus on taking advantage of this important information. Therefore, we propose an affective rating ranking framework for affect recognition based on face images in the valence and arousal dimensional space. Our approach can appropriately use the ordinal information among affective ratings which are generated by discretizing continuous annotations.Specifically, we first train a series of basic cost-sensitive binary classifiers, each of which uses all samples relabeled according to the comparison results between corresponding ratings and a given rank of a binary classifier. We obtain the final affective ratings by aggregating the outputs of binary classifiers. By comparing the experimental results with the baseline and deep learning based classification and regression methods on the benchmarking database of the AVEC 2015 Challenge and the selected subset of SEMAINE database, we find that our ordinal ranking method is effective in both arousal and valence dimensions.
文摘目的:测量正常人群表情运动的可重复性,为患者手术等干预措施的效果评价提供参照数据。方法:征集面部结构大致对称、无面部运动及感觉神经障碍病史的志愿者共15名(男性7名,女性8名,中位年龄25岁)。使用三维动态照相机记录研究对象的面部表情运动(闭唇笑、露齿笑、噘嘴、鼓腮),分辨率为采集频率60帧/s,挑选每个面部表情中最有特征的6帧图像,分别为静止状态时图像(T0)、从静止状态至最大运动状态时的中间图像(T1)、刚达到最大运动状态时的图像(T2)、最大运动状态将结束时的图像(T3)、最大运动状态至静止状态时的中间图像(T4)及动作结束时的静止图像(T5)。采集两次面部表情三维图像数据,间隔1周以上。以静止图像(T0)为参照,将运动状态系列图像(T1~T5)与之进行图像配准融合,采用区域分析法量化分析前后两次同一表情相同关键帧图像与对应静止状态三维图像的三维形貌差异,以均方根(root mean square,RMS)表示。结果:闭唇笑、露齿笑以及鼓腮表情中,前后两次的对应时刻(T1~T5)图像与相应T0时刻的静止图像配准融合,计算得出的RMS值差异无统计学意义。撅嘴动作过程中,前后两次T2时刻对应面部三维图像与相应T0时刻静止图像配准融合,得出RMS值差异有统计学意义(P<0.05),其余时刻的图像差异无统计学意义。结论:正常人的面部表情具有一定的可重复性,但是噘嘴动作的可重复性较差,三维动态照相机能够量化记录及分析面部表情动作的三维特征。