摘要
结合高斯混合模型(GMM)和嗓音起始时间(VOT)特征的普通话音素发音错误检测,提出了一种结合语音声道特征信息和音源特征信息的发音错误检测方法。其中GMM用于反映声道特征信息的MFCC参数的建模与评测,并直接对大部分音素的发音质量直接进行错误检测。对于少数通过MFCC参数和GMM难于检测区分的辅音音素,则通过反映VOT信息的音源特征参数进行区分。实验表明,该方法在训练数据有限的情况下取得了较好的性能,非常适合用于聋人语言康复的计算机辅助训练。
Combining Gaussian mixture model (GMM) and voice onset time (VOT) features, a novel Mandarin phonetic mispronunciation detection approach is proposed which combines the spectral features and the prosodic features. GMMs are used to model the Mel-frequency cepstral coefficients (MFCCs) for all the phonemes and eval- uate the pronunciation of most of the phonemes directly. For some consonants which are difficult to distinguish, prosodic features reflecting the VOT are extracted and used to achieve the classification. Mandarin phonetic mispro- nunciation detection experiments on limited data indicate the significant improvement. So, the new framework is quite helpful for the implementation of independent Putonghua training by the adult hearing-impaired people.
出处
《科学技术与工程》
北大核心
2013年第7期1789-1793,共5页
Science Technology and Engineering
基金
国家自然科学基金项目(61005020
60901061)资助
关键词
语音识别
发音错误检测
高斯混合模型
嗓音起始时间
speech recognition mispronunciation detection Gaussian mixture model voice onset time