摘要
音高估计和发声分类可以帮助快速检索目标语音,是语音检索中十分重要且困难的研究方向之一,对语音识别领域具有重要的意义。提出了一种新型音高估计和发声分类方法。利用梅尔频率倒谱系数(MFCC)进行频谱重构,并在对数下对重构的频谱进行压缩和过滤。通过高斯混合模型(GMM)对音高频率和滤波频率的联合密度建模来实现音高估计,实验结果在TIMIT数据库上的相对误差为6.62%。基于高斯混合模型的模型也可以完成发声分类任务,经试验测试表明发声分类的准确率超过99%,为音高估计和发声分类提供了一种新的模型。
Pitch estimation and vocal classification can help to quickly retrieve the target speech,which is one of the most important and difficult research directions in speech retrieval,and has important significance in the field of speech recognition.A new method for pitch estimation and vocal classification is proposed.The spectrum reconstruction is performed by using the Mel frequency cepstral coefficient(MFCC),and the reconstructed spectrum is compressed and filtered under logarithm.Pitch estimation was performed by modeling the joint density of pitch frequency and filter frequency using Gaussian mixture model(GMM).The relative error of the experimental results on the TIMIT database was 6.62%.The model based on GMM can also complete the vocal classification task.The experimental results show that the accuracy of vocal classification exceeds 99%,which provides a new model for pitch estimation and vocal classification.
作者
张少华
秦会斌
ZHANG Shao-hua;QIN Hui-bin(Institute of New Electron Device&Application,Hangzhou Dianzi University,Hangzhou 310018,China)
出处
《测控技术》
2019年第11期86-89,131,共5页
Measurement & Control Technology
关键词
语音识别
音高估计
梅尔频率倒谱系数
高斯混合模型
speech recognition
pitch estimation
Mel frequency cepstral coefficient
Gaussian mixture model