摘要
首先提出以音素合并后模型自身似然度下降为距离依据,通过聚类生成多语言通用音素的声学建模方法.在此基础上,比较聚类时增加两种约束条件(同一语种内音素不聚类、不同IPA族的音素不聚类)对性能的影响.同时,对通用音素集的规模对识别性能的影响做了一定探索.最后的实验给出建立中英文双语混合模型在关键词检测系统上的结果,比较4种聚类方法在不同通用音素个数情况下的性能优劣.结果显示,使用本文方法进行一定程度的音素合并,性能比不作聚类直接混合建模有明显提升.适当增加音素聚类的约束,有助于进一步提高性能.
A clustering method is proposed to generate muhilingual global phoneme based on the decrease of model self-likelihood. Two linguistic limitations are used in the clustering procedure, and the phonemes in same language or belonging to different international phonetic alphabet (IPA) classes are not merged. In telephone speech keyword spotting system, the performance of several Chinese-English bilingual model are compared which are generated by different phoneme clustering methods. The experimental results show that the merged phoneme set of an appropriate size can generate acoustic models with good quality, far above the results without merging. Moreover, the linguistic limitations added to clustering procedure can improve the performance.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2009年第1期86-90,共5页
Pattern Recognition and Artificial Intelligence
基金
国家863计划资助项目(No.2006AA010103)
关键词
多语言声学建模
音素聚类
关键词检测
Muhilingual Acoustic Modeling, Phoneme Clustering, Keyword Spotting