摘要
根据汉语语音的特点,提出了一种无端点检测的语音识别算法。在识别过程中,该算法无需确定语音信号起止点位置,而是从寂静段开始,直接按帧提取特征(帧长20ms,帧间重叠50%),特征向量由15阶倒谱系数和帧平均能量组成。在动态时间规整(DTW)和隐马尔可夫(HMM)统一模型(DHUM)中,引进寂静段自环,并用DHUM实现了该算法。对99个相似汉语单字的识别实验表明:无端点检测的识别器正识率为94.95%,正识率下降很少,但不作端点检测却降低了算法的复杂程度。该算法中,若特征向量采用一种听觉模型特征,识别器具有更好的鲁棒性,识别率会略有提高。
Describes a characteristic of Chinese speech, and proposes a recognition algorithm without the ending point detection. Compared with the traditional method, in this algorithm, it is not necessary to decide the ending point of speech signals. From the stationary segment on, feature vectors consisting of 15 order Cepstrum coefficients and the average energy of each frame, are extracted in frames(length of each frame is 20 ms and the overlapping between two frames is 50%). By introducing the self loop of the stationary segment of the DTW and HMM Unified Model(DHUM), this algorithm is successfully implemented. In recognition of 99 similar Chinese words, a first candidate recognition rate of 94.95% is obtained. If an auditory feature is accepted for feature vectors, the robustness of the algorithm will be better.
出处
《数据采集与处理》
CSCD
1998年第3期220-223,共4页
Journal of Data Acquisition and Processing
基金
江苏省自然科学基金
国防科工委预研基金
关键词
语音识别
端点检测
汉语语音
隐马尔可夫模型
speech recognition
detection
ending point detection
hidden Markov model
dynamic time warping