中文连续语音识别系统音素建模单元集的构建被引量：2

Phoneme modeling units design for Mandarin LVCSR systems

导出

摘要在识别系统中,建模单元能够勾画一种语言的声学和语音学特性,因此对系统性能起到至关重要的作用。该文参照一些已在大词汇量连续语音识别系统(LVCSR)中取得较好效果的建模单元集,构建了新的音素建模单元集(Ne-wPS)。另外,根据NewPS中元音及其变体对前后接音素协同发音的影响,提出了基于扩展的元音三角图设计问题集(NewQS)的方法。实验表明:NewPS和NewQS结合的识别性能超越了传统的声韵母建模单元集;并且,建模单元数目大幅度的减少给系统后续模块的处理带来了便利。 Modeling units can be used to describe the salient acoustic and phonetic information for a language in speech recognition systems.Thus,they play a very important role in the system.This paper describes a phoneme set using several modeling units,which has good performance in large vocabulary continuous speech recognition（LVCSR） systems.A question set design method is given based on the extended vowel triangle.Tests show that the combination of the new phoneme set and the new question set surpasses the initial/final in performance.Also,the number of modeling units is greatly reduced which is more convenient for processing succeeding system modules.

作者包叶波胡郁刘聪江辉戴礼荣刘庆峰

机构地区中国科学技术大学电子工程与信息科学系安徽科大讯飞信息科技股份有限公司约克大学计算机科学与工程系

出处《清华大学学报（自然科学版）》 EI CAS CSCD 北大核心 2011年第9期1288-1292,1297,共6页 Journal of Tsinghua University(Science and Technology)

关键词大词汇量连续语音识别建模单元元音三角图问题集主元音准则 large vocabulary continuous speech recognition（LVCSR） modeling units vowel triangle question set main vowel principle

分类号 TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献10

1HUANG Chao, SHI Yu, tonal modeling for phone [C]// Proc ICASSP'04. Processing Society, 2004, ZHOU Jianlai, et al. Segmental set design in Mandarin LVCSR Montreal, Canada: IEEE Signal 1: 901 - 904.
2CHANG Eric, ZHOU Jianlai, DI Shuo, et al. Large vocabulary Mandarin speech recognition with different approaches in modeling tones [C]// Proc ICSLP. Beijing, China: China Military Friendship Press, 2000, 2:983 - 986.
3李净,徐明星.汉语连续语音识别中声学模型基元比较:音节、音素、声韵母[C].第六届全国人机语音通信会议,20014:267-280.
4MA Bin, HUO Qiang. Benchmark results of triphone-based acoustic modeling on HKU96 and HKU99 Putonghua corpora [C]//Proc ISCSLP. Beijing, China, 2000: 359- 362.
5XIANG Bing, Long Nguyen, GUO Xuefeng, et al. The BBN Mandarin broadcast news transcription system [C]// Proc Interspeech'05. Lisbon, Portugal, 2005: 1649-1652.
6Hwang M Y, PENG Gang, highly accurate Mandarin WANG Wen, et al. Building a speech recognizer [C]// Proc ASRU'07. Kyoto, Japan: IEEE Signal Processing Society 2007, 490 - 495.
7Hwang M Y, PENG highly accurate Gang, Ostendorf M, et al. Building a Mandarin speech recognizer with language independent technologies and language-dependent modules [J]. IEEE Trans on Audio, Speech and Language Processing, 2009, 17(7) : 1253 - 1262.
8Plahl C, Hoffmeister B, Hwang M Y, et al. Recent improvements of the RWTH GALE Mandarin LVCSR system -C-// Proc Interspeech'08. Brisbane, Australia, 2008:2426 - 2429.
9CHEN C J, LI Haiping, SHEN Liqin, et al. Recognize tone languages using pitch information on the main vowel of each syllable [C]// Proc ICASSP'01. Salt Lake City, UT, USA: IEEE Press, 2001: 61 -64.
10Young S, Evermann G, Gales M, et al. The HTK Book (revised for HTK version 3.4) [M]. Cambridge: Cambridge University, 2006.

共引文献3

1冯丽娟,吾守尔.斯拉木.维吾尔语连续语音识别技术研究[J].现代计算机,2010,16(1):4-7. 被引量：2
2吕丹桔,Mei-Yuh Huang,B Hoffmeister.汉语连续语音识别之音素声学模型的改进[J].计算机仿真,2010,27(5):355-358. 被引量：7
3董丽娜,何怡,叶卫平.基于小波分析的梅尔频率倒谱参数[J].北京师范大学学报（自然科学版）,2015,51(5):469-474. 被引量：2

同被引文献13

1NIST. The Spoken Term Detection(STD) 2006 Evaluation Plan[EB/OL]. (2006-09-16). http://www.itl.nist.gov/iad/mig/tests/ std/2006/docs/std06-evalplan-v 10.pdf.
2Wang Dong, King S. Stochastic Pronunciation Modeling and Soft Match for Out-of-Vocabulary Spoken Term Detection[C]//Proc. of IEEE International Conference on Acoustics Speech and Signal Processing. [S. 1.]: IEEE Press, 2010.
3Gouvea E. Subword Unit Approaches for Retrieval by Voice[C]// Proc. of IEEE International Conference on Acoustics Speech andSignal Processing. [S. 1.]: IEEE Press, 2010.
4Hewlett D. Fully Unsupervised Word Segmentation with BVE and MDL[C]//Proc. of the 49th Annual Meeting of the Association for Computational Linguistics. Portland, USA: [s. n.], 2011.
5Bisani M, Ney H. Open Vocabulary Speech Recognition with Flat Hybrid Models[C]//Proc. of the European Conference on Speech Communication and Technology. Lisboa, Portugal: [s. n.], 2005.
6Akbacak M, Vergyri D, Stolcke A. Open-vocabulary Spoken Term Detection Using Graphone-based Hybrid Recognition Systems[C]// Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing. Las Vegas, USA: [s. n.], 2008.
7Reddy S, Goldsmith J. An MDL-based Approach to Extracting Subword Units for Grapheme-to-Phoneme Conversion[C]//Proe. of Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, USA: ACM Press, 2010.
8NIST. Nistg2p[EB/OL]. (2003-07-01). ftp://jaguar.ncsl.nist.gov// pub/addttp4-1.1 .tar.
9Rissanen J. A Universal Prior for Integers and Estimation by Minimum Description Length[J]. The Annals of Statistics, 1983, 11(2): 416-431.
10潘卫军,吴量,陈华群,罗晓利.空中交通无线电陆空通话错误分析[J].中国西部科技,2008,7(30):1-3. 被引量：10

引证文献2

1杨乐,吴及,吕萍.语音检索中子词单元的构建算法[J].计算机工程,2012,38(24):251-253.
2杨金锋,李凯涛,贾桂敏,师一华.基于DNN-HMM的陆空通话声学模型构建方法[J].中国民航大学学报,2019,37(4):36-40. 被引量：2

二级引证文献2

1彭硕,刘东阳,时国龙,李广博,慕京生,辜丽川,焦俊.基于深度神经网络及隐马尔科夫模型的生猪状态音频识别[J].中国农业大学学报,2022,27(6):172-181. 被引量：4
2邵武,李岩,于蛟.刑事案件现场移动勘查系统研究[J].辽宁警察学院学报,2024,26(1):89-93. 被引量：1

1胡克,康世胤,郝军.中文HMM参数化语音合成系统构建[J].通信技术,2012,45(8):101-103. 被引量：2
2高升,徐波,黄泰翼.基于决策树的汉语三音子模型[J].声学学报,2000,25(6):504-509. 被引量：20
3王松明,蔺美青,高玉良.无源雷达探测覆盖能力柔性评估建模研究[J].空军预警学院学报,2013,27(4):288-292. 被引量：1
4邵健,赵庆卫,颜永红.基于鼻韵尾分离的汉语声韵母识别模型[J].声学学报,2010,35(5):587-592. 被引量：3
5刘迪源,郭武.基于区分性准则的Bottleneck特征及其在LVCSR中的应用[J].数据采集与处理,2016,31(2):331-337. 被引量：2
6QIAN Yanmin XU Ji LIU Jia.Multi-Stream Posterior Features and Combining Subspace Gmms for Low Resource LVCSR[J].Chinese Journal of Electronics,2013,22(2):291-295. 被引量：2
7张磊,陈晶,项学智,贾梅梅.结合关键词混淆网络的关键词检出系统[J].智能系统学报,2010,5(5):432-435. 被引量：1
8潘逸倩,魏思,王仁华.基于韵律信息的连续语流调型评测研究[J].中文信息学报,2008,22(4):88-93. 被引量：4
9刘鹏,王作英.Stream Weight Training Based on MCE for Audio-Visual LVCSR[J].Tsinghua Science and Technology,2005,10(2):141-144. 被引量：1
10吕丹桔,B.Hoffmeister.汉语语音声学特征复合的研究[J].云南大学学报（自然科学版）,2010,32(S1):368-371. 被引量：3

清华大学学报（自然科学版）

2011年第9期

浏览历史

内容加载中请稍等...

中文连续语音识别系统音素建模单元集的构建被引量：2

参考文献10

共引文献3

同被引文献13

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

中文连续语音识别系统音素建模单元集的构建 被引量：2

参考文献10

共引文献3

同被引文献13

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

中文连续语音识别系统音素建模单元集的构建被引量：2