摘要
提出了一种面向多样化声学场景自适应设计声学编码器的方法(SAE)。该方法通过学习不同声学场景下语音中包含的声学特征的差异,适应性地为端到端语音识别任务设计出合适的声学编码器。通过引入神经网络结构搜索技术,提高了编码器设计的有效性,从而改善了下游识别任务的性能。在Aishell-1、HKUST和SWBD三个常用的中英文数据集上的实验表明,通过所提场景自适应设计方法得到的声学编码器相比已有的声学编码器可以获得平均5%以上的错误率改善。所提方法是一种深入分析特定场景下语音特征、针对性设计高性能声学编码器的有效方法。
In this paper,a scene-adaptive acoustic encoder(SAE)is proposed for different speech scenes.This method adaptively designs an appropriate acoustic encoder for end-to-end speech recognition tasks by learning the differences of acoustic features in different acoustic scenes.By the application of the neural architecture search method,the effectiveness of encoder design and the performance of downstream recognition tasks are improved.Experiments on three commonly used Chinese and English dataset,Aishell-1,HKUST and SWBD,show that the proposed SAE can achieve average 5%relative character error rate reductions than the best human-designed encoders.The results show that the proposed method is an effective method for analysis of acoustic features in specific scenes and targeted design of high-performance acoustic encoders.
作者
刘育坤
郑霖
黎塔
张鹏远
LIU Yukun;ZHENG Lin;LI Ta;ZHANG Pengyuan(Key Laboratory of Speech Acoustics and Content Understanding,Institute of Acoustics,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)
出处
《声学学报》
EI
CAS
CSCD
北大核心
2023年第6期1260-1268,共9页
Acta Acustica
基金
国家重点研发计划项目(2020AAA0108002)
中国科学院声学研究所自主部署“目标导向”类项目(MBDX202106)资助。
关键词
自动语音识别
声学编码器
自适应
神经网络结构搜索
Automatic speech recognition
Acoustic encoder
Self-adaptation
Neural architecture search