针对目前连续语音识别中广泛使用的齐次HMM(hidden Markov model)模型识别精度低的现状,该文提出了三音子DDBHMM(duration distribution based HMM)识别方法。根据汉语的特点,设计了适用于连续语音识别的三音子。描述了识别中使用的MLSS...针对目前连续语音识别中广泛使用的齐次HMM(hidden Markov model)模型识别精度低的现状,该文提出了三音子DDBHMM(duration distribution based HMM)识别方法。根据汉语的特点,设计了适用于连续语音识别的三音子。描述了识别中使用的MLSS(most likely statesequence)准则。设计了识别网络并阐明了用于三音子识别的帧同步识别算法。将三音子DDBHMM识别方法与三音子齐次HMM识别方法和双音子DDBHMM识别方法进行了实验对比,结果表明:采用三音子DDBHMM可以使得识别错误率分别下降0.95%和2.29%。说明该方法能够显著地改进连续语音识别性能。展开更多
This work describes an improved feature extractor algorithm to extract the peripheral features of point x(ti,fj) using a nonlinear algorithm to compute the nonlinear time spectrum (NL-TS) pattern. The algo- rithm ob...This work describes an improved feature extractor algorithm to extract the peripheral features of point x(ti,fj) using a nonlinear algorithm to compute the nonlinear time spectrum (NL-TS) pattern. The algo- rithm observes n×n neighborhoods of the point in all directions, and then incorporates the peripheral fea- tures using the Mel frequency cepstrum components (MFCCs)-based feature extractor of the Tsinghua elec- tronic engineering speech processing (THEESP) for Mandarin automatic speech recognition (MASR) sys- tem as replacements of the dynamic features with different feature combinations. In this algorithm, the or- thogonal bases are extracted directly from the speech data using discrite cosime transformation (DCT) with 3×3 blocks on an NL-TS pattern as the peripheral features. The new primal bases are then selected and simplified in the form of the ?dp- operator in the time direction and the ?dp- operator in the frequency di- t f rection. The algorithm has 23.29% improvements of the relative error rate in comparison with the standard MFCC feature-set and the dynamic features in tests using THEESP with the duration distribution-based hid- den Markov model (DDBHMM) based on MASR system.展开更多
To overcome the defects of the duration modeling in the homogeneous Hidden Markov Model(HMM)for speech recognition,a duration-distribution-based HMM(DDBHMM)is proposed in this paper based on a formalized definition of...To overcome the defects of the duration modeling in the homogeneous Hidden Markov Model(HMM)for speech recognition,a duration-distribution-based HMM(DDBHMM)is proposed in this paper based on a formalized definition of a left-to-right inhomogeneous Markov model.It has been demonstrated that it can be identically defined by either the state duration or the state transition probability.The speaker-independent continuous speech recognition experiments show that by only modeling the state duration in DDBHMM,a significant improvement(17.8%error rate reduction)can be achieved compared with the classical HMM.The ideal properties of DDBHMM give promise to many aspects of speech modeling,such as the modeling of the state duration,speed variation,speech discontinuity,and interframe correlation.展开更多
文摘针对目前连续语音识别中广泛使用的齐次HMM(hidden Markov model)模型识别精度低的现状,该文提出了三音子DDBHMM(duration distribution based HMM)识别方法。根据汉语的特点,设计了适用于连续语音识别的三音子。描述了识别中使用的MLSS(most likely statesequence)准则。设计了识别网络并阐明了用于三音子识别的帧同步识别算法。将三音子DDBHMM识别方法与三音子齐次HMM识别方法和双音子DDBHMM识别方法进行了实验对比,结果表明:采用三音子DDBHMM可以使得识别错误率分别下降0.95%和2.29%。说明该方法能够显著地改进连续语音识别性能。
基金Supported by the National High-Tech Research and Development (863) Program of China (No. 200/AA/14)
文摘This work describes an improved feature extractor algorithm to extract the peripheral features of point x(ti,fj) using a nonlinear algorithm to compute the nonlinear time spectrum (NL-TS) pattern. The algo- rithm observes n×n neighborhoods of the point in all directions, and then incorporates the peripheral fea- tures using the Mel frequency cepstrum components (MFCCs)-based feature extractor of the Tsinghua elec- tronic engineering speech processing (THEESP) for Mandarin automatic speech recognition (MASR) sys- tem as replacements of the dynamic features with different feature combinations. In this algorithm, the or- thogonal bases are extracted directly from the speech data using discrite cosime transformation (DCT) with 3×3 blocks on an NL-TS pattern as the peripheral features. The new primal bases are then selected and simplified in the form of the ?dp- operator in the time direction and the ?dp- operator in the frequency di- t f rection. The algorithm has 23.29% improvements of the relative error rate in comparison with the standard MFCC feature-set and the dynamic features in tests using THEESP with the duration distribution-based hid- den Markov model (DDBHMM) based on MASR system.
文摘To overcome the defects of the duration modeling in the homogeneous Hidden Markov Model(HMM)for speech recognition,a duration-distribution-based HMM(DDBHMM)is proposed in this paper based on a formalized definition of a left-to-right inhomogeneous Markov model.It has been demonstrated that it can be identically defined by either the state duration or the state transition probability.The speaker-independent continuous speech recognition experiments show that by only modeling the state duration in DDBHMM,a significant improvement(17.8%error rate reduction)can be achieved compared with the classical HMM.The ideal properties of DDBHMM give promise to many aspects of speech modeling,such as the modeling of the state duration,speed variation,speech discontinuity,and interframe correlation.