摘要
采用同一种特征参数——Mel倒谱系数及其动态参数区分纯语音、带背景语音、乐器音、歌声和环境音.根据该特征参数的特点以及各类音频之间的差异,给出了一种区分性模型训练和特征筛选相结合的多级二分类音频分类方法,即为各级建立GMM(Gaussian mixture model)模型的同时挑选出使当前模型区分程度达到最大的特征子集.对长约2 h的音频数据集的测试结果表明,该方法相对于特征筛选前的分类系统,平均误识率下降了约23.5%,且各二分类子系统的特征维数也有明显地减少.
MFCC and its dynamics were used to distinguish pure speech, impure speech, instrument sounds, songs, and environment sounds. Considering the characteristics of such features and differences between audio types, a hierarchical discrimination algorithm was proposed based on discriminative model training and feature filtration, which trained GMMs (Gaussian mixture models) in each layer and selected the feature subset resulting in maximal separability for them. Within about 2-hour-long database, experimental results indicate that the algorithm outperforms original 90-dimension system by 23. 5% in average error rate, as well as obtains a substantial dimensionality reduction for discriminator every layer.