摘要
针对语音信号的非平稳性,采用SPWD(smoothed pseudo Wigner-Ville distribution)将韵母语音信号在时频面清晰地表现出来。不同声调语音的时频脊的变化特征不同。利用阈值和细化处理将SPWD时频矩阵转变成二值矩阵图像,利用Hough变换提取脊线;而第三声时频脊是曲线,将Hough变换求取的线段用最小二乘法多项式进行拟合;在脊线段上等间距选取若干个点,将点集和其一阶差分作为时频脊特征,利用高斯混合模型进行识别分类。仿真实验结果表明,该方法很好地对声调进行了识别,平均识别率为86.48%,第二声识别率提高的幅度最大,提高了5.18%;在不同的信噪比下,识别率最大可提高5.62%。
For the non-stability of speech signals, we use SPWD to clearly manifest the vowel speech signals on time-frequency plane. The variation features of time-frequency ridges differ from different speech tones. We use threshold and refined processing to convert SPWD time-frequency matrix to a binary matrix image, and use Hough transform to extract ridge lines. But the time-frequency ridge of the third tone is curve, the line segment obtained by Hough transform is fitted with the method of least squares polynomial; We select some points equidistantly on time-frequency ridge line, and use the point set and its first difference as the feature of the time-frequency ridge, then use Gaussian mixture model (GMM) to conduct recognition and classification. Simulation experimental results show that this method is very good to the tone recognition and its average recognition rate is 86.48%. The improvement extent of the second tone' s recognition rate is the highest, as high as5.18%. And in different SNR, the maximum improvement of recognition rate reaches 5.62%.
出处
《计算机应用与软件》
CSCD
北大核心
2014年第3期142-145,共4页
Computer Applications and Software
基金
国家自然科学基金项目(61075008)