基于区分性准则的Bottleneck特征及其在LVCSR中的应用被引量：2

Discriminative Criterion Based Bottleneck Feature and Its Application in LVCSR

下载PDF

导出

摘要基于深层神经网络中间层的Bottleneck(BN)特征由于可以采用传统的混合高斯模型-隐马尔可夫建模(Gaussian mixture model-hidden Markov model,GMM-HMM),在大规模连续语音识别中获得了广泛的应用。为了提取区分性的BN特征,本文提出在使用传统的BN特征训练好GMM-HMM模型之后,利用最小音素错误率(Minimum phone error,MPE)准则来优化BN网络参数以及GMM-HMM模型参数。该算法相对于其他区分性训练算法而言,采用的是全部数据作为一个大的数据包,而不是小的包方式来训练深度神经网络,从而可以大大加快训练速度。实验结果表明,优化后的BN特征提取网络比传统方法能获得9%的相对词错误率下降。 Bottleneck （BN） features based on the middle layer of deep neural network have been widly ap‐plicated to large vocabulary continuous speech recognition （LVCSR） ,because they can use the traditional Gaussian mixture density hidden Markov model （GMM‐HMM） for acoustic modeling .In order to extract discriminative bottleneck features ,the parameters of the BN feature extractor and GMM‐HMM are opti‐mized jointly by using the minimum phone error （MPE） criterion after training the GMM‐HMM using the conventional BN features .Different from other discriminative training method ,large batches instead of mini‐batch in conventional neural network optimization are used to obtain the statistics ,which acceler‐ates training speed .Experiments demonstrate that the proposed bottleneck feature extractor can outper‐form the traditional methods with 9% relative word error reduction .

作者刘迪源郭武

机构地区中国科学技术大学语音及语言信息处理国家工程实验室

出处《数据采集与处理》 CSCD 北大核心 2016年第2期331-337,共7页 Journal of Data Acquisition and Processing

关键词语音识别神经网络区分性训练 Bottleneck特征 speech recognition neural networks discriminative training Bottleneck feature

分类号 TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献16

1Kapadia S, Valtchev V, Young S J. MMI training for continuous phoneme recognition on the TIMIT database[C]//Proceed i ngs of International Conference on Acoustics, Speech and Signal Processing. Minnesota, USA IEEE, 1993491-494.
2Juang B H, Chou W, Lee C H. Minimum classification error rate methods for speech recognition[J]. IEEE Transactions, 1997,5(3) : 257-265.
3Mc Derrnott E, Hazen T, Roux J L, et al. Discriminative training for large vocabulary speech recognition using minimum classification error[J]. IEEE Transactions, 2007,15 (1) : 203-223.
4Povey D, Woodland P. Minimum phone error and I-smoothing for improved discriminative training[C]//Proceedings of In- ternational Conference on Acoustics, Speech and Signal Processing. Florida, USA: IEEE, 2002:105-108.
5Povey D, Kingsbury B, Mangu L, et al. FMPE: Discriminatively trained features for speech recognition[C]//Proceedings of International Conference on Acoustics, Speech and Signal Processing. Philadelphia, USA: IEEE, 2005:961-964.
6Povey D, Kanevsky D, Kingsbury B, et al. Boosted MMI for model and feature-space discriminative training[C]//Proceed- ings of International Conference on Acoustics, Speech and Signal Processing. Las Vegas, USA: IEEE, 2008:4057-4060.
7Saon G, Kingshury B. Discriminative feature-space transforms using deep neural networks[C]//Proceedings of International Speech Communication Association. Portland, USA IEEE, 2012.
8余华,黄程韦,金赟,赵力.基于粒子群优化神经网络的语音情感识别[J].数据采集与处理,2011,26(1):57-62. 被引量：20
9徐以中.神经网络模拟实验与语言认知研究的互动[J].南京航空航天大学学报（社会科学版）,2010,12(1):75-79. 被引量：1
10Dahl G E, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large vocabulary speech recognition [J]. IEEE Transactions, 2012,20(1) ..30-42.

二级参考文献35

1孙宁,孙劲光,孙宇.基于神经网络的语音识别技术研究[J].计算机与数字工程,2006,34(3):58-61. 被引量：9
2余嘉元.认知心理学与神经网络[M]//周志华,曹存根.神经网络及其应用.北京:清华大学出版社,2004.
3Plaut D C, Kello C T. The interplay of speech comprehension and production in phonological development: A forward modeling approach[C]//In B. Mac Whinney (Ed.), The emergence of language. Mahwah. New Jersey: Lawrence Erlbaum Associates. 1999: 381-415.
4Pinker S, Prince A. On language and connectionism: Analysis of a parallel distributed processing model of language acquisition[J]. Cognition, 1988 (28): 73-193.
5Joanisse M. F, Seidenberg M. S. , Impairments in verb morphology after brain injury: A connectionist model[J]. Proceedings of the National Academy of Sciences of the United States of America. 1999(96):7592-7597.
6Banich M T, Mack M. Mind, Brain, and Language: Multidisciplinary Perspectives [M ]. New Jersey: Lawrence Erlbaum Associates, Inc. Publishers, 2002 : 158-162.
7唐一源,唐焕文,等.神经网络及其应用[M].北京:清华大学出版社,2004.
8Wright J F, Ahmad K. The connectionist Simulation of Aphasic Naming[J]. Brain and Language, 1997 (59).
9Nadeau S E. Phonology: A Review and Proposals from a Connectionist Perspective[J]. Brain and Language, 2001 (79): 511-579.
10Husain F T. Tagamets M.-A. , Fromm S. J. , Braun A. R. , Horwitz B. Relating neuronal dynamics for auditory object processing to neuroimaging activity: a computational modeling and an Fmri study[J]. NeuroImage, 2004 (21): 1701-1720.

共引文献19

1周红标.融合语音和脉搏的多模态情感识别研究[J].微电子学与计算机,2015,32(6):5-9. 被引量：4
2王凯.粒子群优化RBF神经网络的语音识别研究[J].数字技术与应用,2013,31(4):109-110.
3李忠国,侯杰,王凯,刘庆华.模糊支持向量机在路面识别中的应用[J].数据采集与处理,2014,29(1):146-151. 被引量：13
4徐照松,元建.基于BP神经网络的语音情感识别研究[J].软件导刊,2014,13(4):11-13. 被引量：6
5钟巍,孔祥维,尤新刚,王波.基于分数倒谱变换的取证语音拼接特征提取与分析[J].数据采集与处理,2014,29(2):248-253. 被引量：6
6郝欢,陈亮,张翼鹏.基于短时能量和最小相对均方误差准则的神经网络语音水印方法[J].数据采集与处理,2014,29(2):254-258. 被引量：1
7陈金龙,范影乐,倪红霞,武薇.基于小波包分解的含噪语音时频特性分析及端点检测[J].数据采集与处理,2014,29(2):293-297. 被引量：3
8李海林,周建江,谭静,汪飞.基于MOPSO算法的卫星共形阵列天线多波束形成[J].数据采集与处理,2014,29(3):415-420. 被引量：7
9奚吉,赵力,左加阔.基于改进多核学习的语音情感识别算法[J].数据采集与处理,2014,29(5):730-734. 被引量：7
10孙卫红,童晓,李强.改进PSO优化参数的LSSVM燃煤锅炉NO_X排放预测[J].数据采集与处理,2015,30(1):231-238. 被引量：10

同被引文献7

1伊·达瓦,匂坂芳典,中村哲.语料资源缺乏的连续语音识别方法的研究[J].自动化学报,2010,36(4):550-557. 被引量：9
2钱彦旻,刘加.低数据资源条件下基于优化的数据选择策略的无监督语音识别声学建模[J].清华大学学报（自然科学版）,2013,53(7):1001-1004. 被引量：2
3张剑,屈丹,李真.基于循环神经网络语言模型的N-best重打分算法[J].数据采集与处理,2016,31(2):347-354. 被引量：3
4秦楚雄,张连海.低资源语音识别中融合多流特征的卷积神经网络声学建模方法[J].计算机应用,2016,36(9):2609-2615. 被引量：7
5黄光许,田垚,康健,刘加,夏善红.低资源条件下基于i-vector特征的LSTM递归神经网络语音识别系统[J].计算机应用研究,2017,34(2):392-396. 被引量：22
6秦楚雄,张连海.基于DNN的低资源语音识别特征提取技术[J].自动化学报,2017,43(7):1208-1219. 被引量：25
7舒帆,屈丹,张文林,周利莉,郭武.采用长短时记忆网络的低资源语音识别方法[J].西安交通大学学报,2017,51(10):120-127. 被引量：20

引证文献2

1刘加,张卫强.低资源语音识别若干关键技术研究进展[J].数据采集与处理,2017,32(2):205-220. 被引量：8
2吕浩田,马志强,王洪彬,谢秀兰.基于CNN-CTC的蒙古语层迁移语音识别模型[J].中文信息学报,2022,36(6):52-60. 被引量：7

二级引证文献15

1舒帆,屈丹,张文林,周利莉,郭武.采用长短时记忆网络的低资源语音识别方法[J].西安交通大学学报,2017,51(10):120-127. 被引量：20
2周虎,张承明,张仁堂,杨晓霞,陈岩.红枣黑变过程中主要成分连续变化模拟方法[J].科教导刊（电子版）,2018,0(15):284-285.
3叶硕,彭春堂,杜珍珍,贺娟.基于DTW的孤立词语音识别系统设计[J].长江大学学报（自然科学版）,2018,15(17):33-37. 被引量：5
4呼媛玲,寇媛媛.基于音素的英文发音自动评测系统设计[J].自动化与仪器仪表,2018,0(11):160-163.
5李山.智能家具语音识别精准度优化仿真[J].计算机仿真,2018,35(11):281-284. 被引量：5
6翁煜,冯宗伟,曹博海.人脸识别、车牌识别及智能语音分析在移动警务中的应用研究综述[J].科学与信息化,2019,0(24):37-40.
7崔阳,刘长红.基于PIFA的语音识别系统评测平台[J].计算机科学,2020,47(S02):638-641. 被引量：5
8边巴旺堆,王希,王君堡.藏语语音识别研究进展综述[J].高原科学研究,2022,6(4):76-84. 被引量：3
9沈之杰,郭武.基于预训练与音素字节对编码的越南语识别[J].数据采集与处理,2023,38(1):101-110. 被引量：1
10王雨佳.基于语音合成的机器翻译机器人设计[J].自动化与仪器仪表,2023(4):185-190. 被引量：1

1Liu Lei.Domestic Mobile Phone＇ s bottleneck of technical and solution[J].中国商界：上半月,2012(6):214-216.
2XIAO Yang,DU Haifeng,CAO Zhenzhen,LEE Moon Ho.2-D Stability Analysis for Bottleneck Networks with Active Queue Management （）[J].Chinese Journal of Electronics,2007,16(3):519-524. 被引量：5
3Junyi ZHU Su-Huai WEI.Overcoming doping bottleneck by using surfactant and strain[J].Frontiers of Materials Science,2011,5(4):335-341. 被引量：1
4洪新海,宋彦,蒋兵,戴礼荣.采用DBN的TV改进方法在语种识别中的应用[J].信号处理,2015,31(9):1152-1158. 被引量：4
5ZHANG Yong,HU Ruimin.Speech wideband extension based on Gaussian mixture model[J].Chinese Journal of Acoustics,2009,28(4):362-377. 被引量：4
6周猛,王江.A Novel Uplink Inter-Cell Interference Management Method for TD-LTE System[J].China Communications,2011,8(2):173-180.
7陆敏,乐孜纯,金琦峰,付明磊,刘恺.WDM网络中基于均衡策略的波长分配算法[J].光通信技术,2006,30(8):16-19. 被引量：4
8Ran Xin,Zhang Yongxin.Online split-and-merge expec tation-maximization training of Gaussian mixture model and its optimization[J].High Technology Letters,2012,18(3):302-307.
9贺前华,陆以勤,韦岗.一种新的HMM训练方法[J].电子学报,2000,28(9):56-58. 被引量：2
10彭煊,王炳锡.基于高斯混合模型差别度量的说话人聚类[J].计算机工程与应用,2005,41(5):99-102. 被引量：1

数据采集与处理

2016年第2期

浏览历史

内容加载中请稍等...

基于区分性准则的Bottleneck特征及其在LVCSR中的应用被引量：2

参考文献16

二级参考文献35

共引文献19

同被引文献7

引证文献2

二级引证文献15

相关作者

相关机构

相关主题

浏览历史

基于区分性准则的Bottleneck特征及其在LVCSR中的应用 被引量：2

参考文献16

二级参考文献35

共引文献19

同被引文献7

引证文献2

二级引证文献15

相关作者

相关机构

相关主题

浏览历史

基于区分性准则的Bottleneck特征及其在LVCSR中的应用被引量：2