摘要
为了更好地描述非平稳音频信号的特征,提出了一种基于Gabor字典和稀疏表示权重张量的时-频音频特征提取方法。该方法基于Gabor字典将音频信号编码为稀疏的权重向量,并进一步将权重向量中的元素重新排列为张量形式,该张量各阶分别刻画了信号的时间、频率以及时长特性,为信号的联合时-频-长表示。通过对该张量进行因子分解,将分解后得到的频率因子和时长因子拼接为音频特征。针对稀疏张量分解时容易产生过拟合的问题,提出一种自调整惩罚参数分解算法并进行了改进。实验结果显示,所提出的特征相对于传统梅尔倒谱系数(MFCC)特征、MFCC特征及匹配追踪算法(MP)求解的特征联合拼接得到的MFCC+MP特征和非均匀尺度-频率图特征对15类音效分类效果分别提升了28.0%、19.8%和6.7%。
A joint time-frequency audio feature extraction algorithm based on Gabor dictionary and weight tensor of sparse representation was proposed to describe the characteristic of non-stationary audio signal. Conventional sparse representation uses a predefined dictionary to encode the audio signal as sparse weight vector. In this paper,the elements in the weight vector were reorganized into tensor format. Each order of the tensor respectively characterized time,frequency and duration property of signal,making it the joint time-frequency-duration representation of the signal. The frequency factors and duration factors were concatenated as audio features through tensor decomposition. To solve the over-fitting problem of sparse tensor factorization,an automatic-adjust-penalty-coefficient factorization algorithm was proposed. The experimental results show that the proposed feature outperforms MFCC( Mel-Frequency Cepstrum Coefficient) feature,MFCC + MP feature concatenated by MFCC and Matching Pursuit( MP) features,and nonuniform scale-frequency map feature by 28. 0%,19. 8% and 6. 7%respectively,in 15-category audio classification.
出处
《计算机应用》
CSCD
北大核心
2016年第5期1426-1429,1438,共5页
journal of Computer Applications
基金
国家自然科学基金资助项目(61301300)~~
关键词
稀疏表示
张量因子分解
音效分类
时-频特征
sparse representation
tensor factorization
audio effect classification
time-frequency feature