摘要
虽然隐马尔可夫模型(HMM)是当前最为流行的语音识别模型,但由于一般都采用了状态输出独立假设,因此存在着不能描述语音现象中时间相关性的固有缺陷.本文提出的新模型对语音状态输出特征矢量序列的静态和动态特性信息分别进行参数化建模,然后将它们结合在一起,由此在基于段长分布的HMM(DDBHMM)中引入了帧间相关信息.这种引入帧间相关信息的HMM能够更为精确地描述真实的语音现象.本文在给出新模型的框架后,推导了模型参数的估值公式,并给出了模型的训练和识别算法.汉语非特定人孤立音节的识别实验表明,引入帧间相关信息使HMM的识别性能得到了明显的改善.
Although Hidden Markov Model (HMM) is the most popular model for speech recognition, there has ho an intrinsic defect that, commonly assuming the output observations of a state are independent and identically-distributed(IID),it is unable to describe the time-correlation properties of the speech phenomena. The new model proposed in this paper introduces the inter-frame correlation information into Duration-Distribution-Based HMM (DDBHMM ) by modeling separately the static and dynamic charactedstics of output observation vector sequences of speech states using parametric models and combining them into an nitegrated model. This new HMM including the inter-frame correlation information can characterize the real speech phenomena more presisely. After introducing the structure of the new model, we give the estimation formulas for the parameters of the new model and the algorithms for training and recognition.The experiment for speaker-independent recognition of all Chinese syllables shows that including the inter-frame correlation information improves the perfomance of HMM distinctively.
出处
《电子学报》
EI
CAS
CSCD
北大核心
1998年第10期50-54,8,共6页
Acta Electronica Sinica
基金
国家863高科技计划
"211"工程支持项目
关键词
语音识别
隐马尔可夫模型
帧间相关
Speech recognition, Hidden Markov model(HMM), Inter-frame correlation