基于多层次语音情绪识别网络的机器人表情控制

Robot facial expression control based on multi-level speech emotion recognition network

下载PDF

导出

摘要面部表情与头部姿态是仿人机器人表达情绪的重要途径,精准的情绪识别与流畅的表情动作对于提升人机交互体验非常关键。为了满足上述要求,本文首先提出了一种基于跨越注意力与多层次声学集成学习的语音情绪识别算法,然后在自研仿人机器人平台上部署该算法,实现了高仿真的人机交互。具体地,研究搭建了包含16个伺服位置舵机且拥有高仿真表情和多自由度头部姿态的仿人机器人,基于对关节角度的插值算法与轨迹规划,实现人机交互过程中的机器人面部表情的柔顺控制。此外,研究构建了基于跨越注意力与多层次声学集成学习的语音情绪模型,该模型首先使用深度卷积网络对多源音频信号进行特征提取,再将多种特征进行跨越注意力机制特征融合,解决了频域信息问题和其维度较高导致的维度含义不清晰的问题。实验结果表明,本文提出的方法比现有其他方法具有更好的性能,结合仿人机器人平台能够实现高仿真的人机情感交互。 Facial expressions and head posture are important ways for humanoid robots to express emotions.Accurate emotion recognition and smooth facial expressions are crucial for improving the human-computer interaction experience.To meet the above requirements,this article firstly proposes a speech emotion recognition algorithm based on cross attention and multi-level acoustic ensemble learning,and then deploys the algorithm on a self-developed humanoid robot platform to achieve high simulation humanmachine interaction.Specifically,the paper builds a humanoid robot that includes 16 servo position servos and has high simulation expressions and multi degree of freedom head posture.Based on interpolation algorithms for joint angles and trajectory planning,the paper achieves smooth control of robot facial expressions during human-machine interaction.In addition,the paper constructs a speech emotion model based on cross attention and multi-level acoustic ensemble learning.This model firstly uses deep convolutional networks to extract features from multi-source audio signals,and then fuses multiple features across attention mechanisms to solve the problem of frequency domain information and unclear dimensional meanings caused by high dimensionality.The experimental results show that the proposed method has better performance than other existing methods,and combined with a humanoid robot platform highly simulated human-machine emotional interaction could be achieved.

作者杨琦杨芳艳袁野王佳琦 YANG Qi;YANG Fangyan;YUAN Ye;WANG Jiaqi(Institute of Machine Intelligence,University of Shanghai for Science and Technology,Shanghai 200093,China;School of Mechanical Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;School of Health Science and Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)

机构地区上海理工大学机器智能研究院上海理工大学机械工程学院上海理工大学健康科学与工程学院

出处《智能计算机与应用》 2024年第10期41-49,共9页 Intelligent Computer and Applications

关键词跨越注意力多层次声学语音情绪识别深度卷积网络插值算法 crossing attention multi-level acoustics speech emotion recognition deep convolutional network interpolation algorithm

分类号 TP241 [自动化与计算机技术—检测技术与自动化装置]