期刊文献+

多智能体自组织语音识别

Multi-agent ad-hoc speech recognition
下载PDF
导出
摘要 语音感知是无人系统的重要组成部分,已有的工作大多集中于单个智能体的语音感知,受噪声、混响等因素的影响,性能存在上限。因此研究多智能体语音感知,通过多智能体自组织、相互协作,提高感知性能非常必要。假设每个智能体输出一个通道的语音流条件下,本文提出一种多智能体自组织语音系统,旨在综合利用所有通道提高感知性能;并进一步以语音识别为例,提出能处理大规模多智能体语音识别的通道选择方法。基于Sparsemax算子的端到端语音识别流注意机制,将带噪通道权重置零,使流注意力具备通道选择能力,但Sparsemax算子会将过多通道权重置零。本文提出Scaling Sparsemax算子,只将带噪较强的通道权重置零;同时提出了多层流注意力结构,有效降低了计算复杂度。在30个智能体的无人系统环境下,基于conformer架构的识别系统实验结果表明,在通道数失配的测试环境下,提出的Scaling Sparsemax在仿真数据集上的文字差错率(WER)相比Softmax降低30%以上,在半真实数据集上降低20%以上。 Speech perception is an important part of unmanned systems.Most of the existing work focuses on the speech perception of a single agent,which is affected by factors such as noise and reverberation,and the performance has an upper limit.Therefore,it is necessary to study multi-agent speech perception,and improve perception performance through multi-agent self-organization and mutual cooperation.A multi-agent ad-hoc speech system is proposed under the assumption that each agent outputs a channel of speech stream.The multi-agent ad-hoc speech system aims to comprehensively utilize all channels to improve perception performance.Taking the speech recognition as an example,a channel selection method that can handle large-scale multi-agent speech recognition is proposed.Specifically,an end-to-end speech recognition stream attention mechanism based on Sparsemax operator is proposed to force the channel weights of noisy channels to zero,and make the stream attention bear the function of channel selection.Nevertheless,Sparsemax would punish the weights of many channels to zero harshly.Therefore,Scaling Sparsemax is proposed,which punishes the channels mildly by setting the weights of strong noise channels to zero only.At the same time,a multilayer stream attention structure is proposed to effectively reduce computational complexity.Experimental results in an unmanned system environment with up to 30 agents under the conformer speech recognition architecture show that the Word Error Rate(WER)of the proposed Scaling Sparsemax is lower than that of Softmax by over 30%on simulation data sets,and by over 20%on semi-real data sets,in test scenarios with mismatched channel numbers.
作者 陈俊淇 张晓雷 CHEN Junqi;ZHANG Xiaolei(School of Marine Science and Technology,Northwestern Polytechnical University,Xi'an Shaanxi 710072,China)
出处 《太赫兹科学与电子信息学报》 2023年第9期1163-1170,1187,共9页 Journal of Terahertz Science and Electronic Information Technology
关键词 多智能体语音识别 通道选择 注意力 Scaling Sparsemax算子 multi-agent speech recognition channel selection attention Scaling Sparsemax
  • 相关文献

参考文献2

二级参考文献14

共引文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部