基于音素聚类的多语言声学建模方法被引量：1

Multilingual Acoustic Modeling Method Based on Phoneme Clustering

导出

摘要首先提出以音素合并后模型自身似然度下降为距离依据,通过聚类生成多语言通用音素的声学建模方法.在此基础上,比较聚类时增加两种约束条件(同一语种内音素不聚类、不同IPA族的音素不聚类)对性能的影响.同时,对通用音素集的规模对识别性能的影响做了一定探索.最后的实验给出建立中英文双语混合模型在关键词检测系统上的结果,比较4种聚类方法在不同通用音素个数情况下的性能优劣.结果显示,使用本文方法进行一定程度的音素合并,性能比不作聚类直接混合建模有明显提升.适当增加音素聚类的约束,有助于进一步提高性能. A clustering method is proposed to generate muhilingual global phoneme based on the decrease of model self-likelihood. Two linguistic limitations are used in the clustering procedure, and the phonemes in same language or belonging to different international phonetic alphabet （IPA） classes are not merged. In telephone speech keyword spotting system, the performance of several Chinese-English bilingual model are compared which are generated by different phoneme clustering methods. The experimental results show that the merged phoneme set of an appropriate size can generate acoustic models with good quality, far above the results without merging. Moreover, the linguistic limitations added to clustering procedure can improve the performance.

作者孟猛梁家恩徐波

机构地区中国科学院自动化研究所数字内容技术研究中心中国科学院自动化研究所模式识别国家重点实验室

出处《模式识别与人工智能》 EI CSCD 北大核心 2009年第1期86-90,共5页 Pattern Recognition and Artificial Intelligence

基金国家863计划资助项目(No.2006AA010103)

关键词多语言声学建模音素聚类关键词检测 Muhilingual Acoustic Modeling, Phoneme Clustering, Keyword Spotting

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献7

1Liu Chen, Mclnar L. An Automated Linguistic Knowledge-Based Cross-Language Transfer Method for Building Acoustic Models for a Language without Native Training Data//Pwc of the 9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005 : 1365 - 1368
2于胜民,张树武,徐波.汉英双语混合声学建模方法研究[J].中文信息学报,2004,18(5):78-84. 被引量：4
3Byme W, Beyedein P, Huerta J M, et al. Towards Language Inde- pendent Acoustic Modeling// Proc of the IEEE International Con- ference on Acoustics, Speech and Signal Processing. Istanbul, Turkey, 2000 : 1029 - 1032
4Zgank A, Imperl B, Johansen F T, et al. Crosslingual Speech Recognition with Multilingual Acoustic Models Based on Agglomerative and Tree-Based Triphone Clustering //Proc of the 7th European Conference on Speech Communication and Technology. Aalborg, Denmark, 2001:2725-2729
5Zgank A, Kacic Z, Vicsi K, et al. Crosslingual Transfer of Source Acoustic Models to Two Different Target Languages// Proc of the COST278 and ISCA Tutorial and Research Workshop on Robustness Issues in Conversational Interaction. Norwich, UK, 2004 : 19
6Sooful J J, Botha E C. An Acoustic Distance Measure for Automatic Cross-Language Phoneme Mapping// Proc of the Pattern Recognition Association of South Africa. Franschhoek, South Africa, 2001 : 99 - 102
7Tsai M Y, Lee L S. Pronunciation Variation Analysis Based on Acoustic and Phonemic Distance Measures with Application Examples on Mandarin Chinese//Proc of the Workshop on Automatic Speech Recognition and Understanding. Virgin Islands, USA, 2003 : 117 -122

二级参考文献14

1Byrne. B., P. Beyerlein, J. M. Huerta et al., Towards Language Independent Acoustic Modeling[ A]. IEEE ICASSP [C], 2000, Istanbul, Turkey. 2:1029- 1032.
2Adda-Decker M., Towards Multilingual Interoperability in Automatic Speech Recognition [ J], Speech Communication, 2001,35(1-2):5-20.
3Wells, C.J., Computer-coded phonemic notation of individual languages of the European community [ J]. J. Int.Phonetic Assoc., 1989,19:32- 54.
4Hieronymus, J.L., ASCH phonetic symbols for the world's languages Worldbet [ J]. J. Int. Phonetic Assoc., 1993,23.
5IPA, The International Phonetic Association (revised to 1993) - IPA Chat [J]. J. Int. Phonetic Assoc., 1993,23.
6Schultz T. and A. Waibel, Language- independent and language-adaptive acoustic modeling for speech recognition[J]. Speech Communication, 2001,35(1 - 2) :31 - 51.
7Kohler J., Multilingual phone models for vocabulary-independent speech recognition tasks [J], Speech Communication, 2001,35( 1 - 2) :21 - 30.
8Uebler U., Multilingual speech recognition in seven languages [J], Speech Communication, 2001,35(1 - 2):53-69.
9Bin Ma and Qiang Huo. Benchmark results of triphone-based acoustic modeling on HKU96 and HKU99 putonghua corpora[ A], ISCSLP [ C ], 2000, 359 - 362.
10Brian Mak and Etienne Bamard. Phone clustering using the bhattacharyya distance[ A], ICSLP [C], 1996,2005 -2008.

共引文献3

1王士进,孟猛,梁家恩,徐波.基于Multilingual的音素识别及其在语种识别中的应用[J].清华大学学报（自然科学版）,2008,48(S1):678-682. 被引量：2
2李生,赵铁军.Chinese Information Processing and Its Prospects[J].Journal of Computer Science & Technology,2006,21(5):838-846. 被引量：1
3徐明,黄中伟,杨磊.普通话发音训练多级音素模板综合评价方法[J].计算机工程与应用,2007,43(28):237-239.

引证文献1

1陈峰.两微一端视听节目智慧监管系统设计与实现[J].中国有线电视,2021(1):52-55. 被引量：1

二级引证文献1

1李银树.基于视觉识别的机器人音频多模态情感识别系统设计[J].自动化与仪器仪表,2021(5):212-215.

1郑永军,张连海.融合查询扩展和动态匹配的集外词检测[J].数据采集与处理,2014,29(2):280-285.
2杨鹏,谢磊,张艳宁.低资源语言的无监督语音关键词检测技术综述[J].中国图象图形学报,2015,20(2):211-218. 被引量：3
3韩疆,刘晓星,颜永红,张鹏远.一种任务域无关的语音关键词检测系统[J].通信学报,2006,27(2):137-141. 被引量：2
4张文超,吕岳,文颖,黄志敏.几何信息与SIFT特征相结合的特定人手写关键词检测[J].智能系统学报,2014,9(5):544-550. 被引量：1
5王勇,张连海.基于点过程模型连续语音关键词检测[J].太赫兹科学与电子信息学报,2013,11(6):958-963. 被引量：2
6阿发.用CSS实现中英文双语导航菜单[J].网友世界,2006(22):56-56.
7RFonline A族,B族,C族谁强谁弱？[J].电脑技术（数码风尚）,2006(1):79-79.
8马晓梅,李雪耀,张汝波,徐东.关键词检测系统中废料模型技术的研究[J].应用科技,2006,33(4):54-56.
9钟山,何亮,邓妍,刘加.基于最大似然线性回归矩阵的说话人识别算法研究[J].自动化学报,2009,35(5):546-550.
10刘鑫,陆林生.关键词检测系统中声学置信度的应用[J].计算机工程,2004,30(8):28-30. 被引量：2

模式识别与人工智能

2009年第1期

浏览历史

内容加载中请稍等...

基于音素聚类的多语言声学建模方法被引量：1

参考文献7

二级参考文献14

共引文献3

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于音素聚类的多语言声学建模方法 被引量：1

参考文献7

二级参考文献14

共引文献3

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于音素聚类的多语言声学建模方法被引量：1