摘要
自动语音识别(ASR)技术目前已发展得较为成熟,通用ASR引擎已经广泛应用于交通、医疗、通信等行业。但是,由于行业专有词汇在大规模训练语料库中呈非独立同态分布,通用ASR引擎在各细分行业转写时存在对行业专有词汇识别准确率低的问题。相较于互联网环境的16 kHz音频采样率,电话呼叫中心语音为窄带低采样(采样率8 kHz),转写后精度下降尤为明显。为了提高行业词汇的语音转写准确率,文中提出一种基于行业词表的ASR转写后优化技术。首先,对语料库文本数据分别采用卷积神经网络模型和深度神经网络BERT模型进行预测分词,生成行业纠错词表。随后,在生产环境中,使用通用ASR引擎对电话呼叫语音数据进行初始转写。然后,对一次转写后的文本,通过Soft-Masked BERT模型结合纠错词表实现文本数据的纠错,从而提高语音识别准确率。使用广州12345热线客服通话语音数据进行训练和测试,结果表明,使用文中的转写后优化技术可以将通用ASR引擎的行业用词转写准确率提高约10个百分点,且纠错速度较快,具有良好的适用性。
Automatic speech recognition(ASR)technology has been developed relatively mature,and general ASR engines have been widely used in transportation,medical,communication and other industries.However,due to non-independent homology of industry-specific vocabulary in the large-scale training corpus,there comes to low recognition accuracy of industry-specific vocabulary when the general ASR engines are applied to various subdivisions of industries.As compared with 16 kHz audio sampling rate in Internet environment,narrowband low sampling(8 kHz)of call center may result in more significant decrease of recognition accuracy of ASR.In order to improve the accuracy of speech recognition of industry-specific words,this paper proposes a translation optimization technology of ASR based on industry-specific vocabulary.Specifically,first,convolutional neural network model and deep neural network BERT model are used to predict word for corpus text data,and an industry-specific error correction vocabulary is generated.Next,in the production environment,a general ASR engine is used to perform initial transcription of telephone call voice data.Then,the transcribed text is corrected by using the Soft-Masked BERT model combined with the industry-specific error correction vocabulary,thus improving the accuracy of speech recognition.Finally,by using 12345 hotline customer service call voice data for modeling and testing,the proposed translation optimization technology is proved efficient in improving the accuracy of general ASR recognition by 10 percentage points with high error correction speed and good applicability.
作者
马晓亮
安玲玲
邓从健
杜德泉
张国新
MA Xiaoliang;AN Lingling;DENG Congjian;DU Dequan;ZHANG Guoxin(Guangzhou Institute of Technology,Xidian University,Guangzhou 510555,Guangdong,China;Guangzhou Branch of China Telecom Co.,Ltd.,Guangzhou 510620,Guangdong,China;Ma Xiaoliang’s Model Worker and Innovative Craftsman Workshop,Guangzhou 510620,Guangdong,China;Guangzhou Yunqu Information Technology Co.,Ltd.,Guangzhou 510665,Guangdong,China;Guangdong Branch of China Telecom Co.,Ltd.,Guangzhou 510080,Guangdong,China)
出处
《华南理工大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2023年第8期118-125,共8页
Journal of South China University of Technology(Natural Science Edition)
基金
国家重点研发计划项目(2022YFB3102700)
国家自然科学基金重点资助项目(62132013)。
关键词
文本纠错
语音识别
客服通话
行业纠错词表
卷积神经网络
text error correction
speech recognition
customer service calls
industry-specific vocabulary
convolutional neural network