A Phonetic-Semantic Pre-Training Model for Robust Speech Recognition

导出

摘要 Robustness is a long-standing challenge for automatic speech recognition(ASR)as the applied environment of any ASR system faces much noisier speech samples than clean training corpora.However,it is impractical to annotate every types of noisy environments.In this work,we propose a novel phonetic-semantic pre-training(PSP)framework that allows a model to effectively improve the performance of ASR against practical noisy environments via seamlessly integrating pre-training,self-supervised learning,and fine-tuning.In particular,there are three fundamental stages in PSP.First,pre-train the phone-to-word transducer(PWT)to map the generated phone sequence to the target text using only unpaired text data;second,continue training the PWT on more complex data generated from an empirical phone-perturbation heuristic,in additional to self-supervised signals by recovering the tainted phones;and third,fine-tune the resultant PWT with real world speech data.We perform experiments on two real-life datasets collected from industrial scenarios and synthetic noisy datasets,which show that the PSP effectively improves the traditional ASR pipeline with relative character error rate(CER)reductions of 28.63%and 26.38%,respectively,in two real-life datasets.It also demonstrates its robustness against synthetic highly noisy speech datasets.

作者 Xueyang Wu Rongzhong Lian Di Jiang Yuanfeng Song Weiwei Zhao Qian Xu Qiang Yang

机构地区 Department of Computer Science and Engineering WeBank Co.Ltd.

出处《CAAI Artificial Intelligence Research》 2022年第1期1-7,共7页 CAAI人工智能研究（英文）

关键词 pre-training automatic speech recognition self-supervised learning

分类号 TN9 [电子电信—信息与通信工程]

引文网络
相关文献

参考文献1

1QIU XiPeng,SUN TianXiang,XU YiGe,SHAO YunFan,DAI Ning,HUANG XuanJing.Pre-trained models for natural language processing: A survey[J].Science China(Technological Sciences),2020,63(10):1872-1897. 被引量：142

共引文献141

1王伟,阮文翰,孟祥福.融合对抗训练的中文GPT对话模型研究[J].辽宁工程技术大学学报（自然科学版）,2023(3):378-384.
2邱凯锋,王则远,何志超,付凯利,梅童霖,关英杰,高飞,伍俊妍.人工智能技术在超说明书用药循证中的应用研究[J].中华临床医师杂志（电子版）,2023,17(12):1212-1218.
3余同瑞,金冉,韩晓臻,李家辉,郁婷.自然语言处理预训练模型的研究综述[J].计算机工程与应用,2020,56(23):12-22. 被引量：47
4Yi HAN,Linbo QIAO,Jianming ZHENG,Hefeng WU,Dongsheng LI,Xiangke LIAO.A survey of script learning[J].Frontiers of Information Technology & Electronic Engineering,2021,22(3):341-373.
5郝超,裘杭萍,孙毅,张超然.多标签文本分类研究进展[J].计算机工程与应用,2021,57(10):48-56. 被引量：23
6邱石贵,章化奥,段湘煜,张民.神经机器翻译的词级别正则化[J].厦门大学学报（自然科学版）,2021,60(4):662-669.
7王涛,刘超辉,郑青青,黄嘉曦.基于单向Transformer和孪生网络的多轮任务型对话技术[J].计算机工程,2021,47(7):55-58.
8陈晓玲,唐丽玉,胡颖,江锋,彭巍,冯先超.基于ALBERT模型的园林植物知识实体与关系抽取方法[J].地球信息科学学报,2021,23(7):1208-1220. 被引量：5
9王永鹏,周晓磊,马慧敏,曹吉龙,无.联合知识的融合训练模型[J].计算机系统应用,2021,30(7):50-56. 被引量：1
10杨修远,彭韬,杨亮,林鸿飞.基于知识蒸馏的自适应多领域情感分析[J].山东大学学报（工学版）,2021,51(3):15-21. 被引量：1

1孙鹏飞,欧阳亚文,宋定杰,戴新宇.Self-Supervised Task Augmentation for Few-Shot Intent Detection[J].Journal of Computer Science & Technology,2022,37(3):527-538. 被引量：1
2叶小连,太光平.阿尔茨海默病小鼠模型新奇探索能力的行为学研究[J].阿尔茨海默病及相关病杂志,2021,4(4):310-313.
3方鹏飞,李贤,燕阳,章帅,康启越,李晓飞,蓝振忠.Connecting the Dots in Self-Supervised Learning:A Brief Survey for Beginners[J].Journal of Computer Science & Technology,2022,37(3):507-526.
4Yanhua Yu,Kanghao He,Jie Li.Adversarial Training for Supervised Relation Extraction[J].Tsinghua Science and Technology,2022,27(3):610-618. 被引量：2
5Deepak Ramesh Chandran.Use of AI Voice Authentication Technology Instead of Traditional Keypads in Security Devices[J].Journal of Computer and Communications,2022,10(6):11-21.
6DUAN PuHong,XIE ZhuoJun,KANG XuDong,LI ShuTao.Self-supervised learning-based oil spill detection of hyperspectral images[J].Science China(Technological Sciences),2022,65(4):793-801. 被引量：3
7Hengyang Lu,Yutong Lou,Bin Jin,Ming Xu.What is Discussed about COVID-19:A Multi-Modal Framework for Analyzing Microblogs from Sina Weibo without Human Labeling[J].Computers, Materials & Continua,2020(9):1453-1471.
8Wen-Ming Wu,Xiao-Hui Yang,Yun-Mei Chen,Juan Zhang,Dan Long,Li-Jun Yang,Chen-Xi Tian.Layer-Wise Pre-Training Low-Rank NMF Model for Mammogram-Based Breast Tumor Classification[J].Journal of the Operations Research Society of China,2019,7(4):515-537. 被引量：1
9高洁.趣味英语听力 Smart Home Devices[J].疯狂英语（初中天地）,2022(4):14-15.
10Peter O. Hongo,Galcano C. Mulaku.Flooding of Lake Nakuru National Park and Its Effects on the Resident Wildlife[J].Journal of Geographic Information System,2021,13(6):660-670.

CAAI Artificial Intelligence Research

2022年第1期

浏览历史

内容加载中请稍等...

A Phonetic-Semantic Pre-Training Model for Robust Speech Recognition

参考文献1

共引文献141

相关作者

相关机构

相关主题

浏览历史