摘要
本文研究了一种利用短时能频值(Energy-Frequency-Value)进行语音端点检测的方法,它区别于传统的分别用短时能量和短时平均过零率作是否超过阈值判断,再通过“与”和“或”运算判定语音端点的方法,而是把两者有机结合起来成为短时能频值。为提高该方法对噪声的适应性,进一步引入了相对阈值的概念,它是两个时刻的语音采样的比值关系,具有相对意义。为检验这种方法的性能,用Burg法求取了线性预测倒频谱(LPC-CEP)并以它为主要参数,短时能频值作端点检测,建立了一个基于离散隐马尔可夫模型(DHMM)的语音识别系统,经过实验验证,平均识别率达到了91.4%,证明了这种时域参数的良好性能.
In this paper, a new method using Energy-Frequency-Value (EFV) is studied in speech endpoints detection. This method is different from the conventional one which detects endpoints by using the 'and' and 'or' value of power and zero-crossing rate separately. On the contrary, the EFV combines the power and the zero-crossing rate as one factor. Furthermore, the concept of relative threshold is introduced to satisfy the noise environments. The relative threshold is the ratio of two values at different time and has relative meaning. Based on Discrete Hidden Markov Model (HMM), this paper uses Burg method to get LPC derived cepstrum (LPC-CEP) as identifying parameters in setting up a speech recognition system. The experiments indicate that this method is satisfactory, and the average correct rate of recognition is up to 91. 4%, which proves the good performance of EFV.
出处
《测试技术学报》
1999年第1期21-27,共7页
Journal of Test and Measurement Technology
关键词
端点
语言基频识别
短时能频值
检测
extreme points
language fundamental frequency recognition
linear