基于稀疏表示权重张量的音频特征提取算法被引量：5

Audio feature extraction algorithm based on weight tensor of sparse representation

下载PDF

导出

摘要为了更好地描述非平稳音频信号的特征,提出了一种基于Gabor字典和稀疏表示权重张量的时-频音频特征提取方法。该方法基于Gabor字典将音频信号编码为稀疏的权重向量,并进一步将权重向量中的元素重新排列为张量形式,该张量各阶分别刻画了信号的时间、频率以及时长特性,为信号的联合时-频-长表示。通过对该张量进行因子分解,将分解后得到的频率因子和时长因子拼接为音频特征。针对稀疏张量分解时容易产生过拟合的问题,提出一种自调整惩罚参数分解算法并进行了改进。实验结果显示,所提出的特征相对于传统梅尔倒谱系数(MFCC)特征、MFCC特征及匹配追踪算法(MP)求解的特征联合拼接得到的MFCC+MP特征和非均匀尺度-频率图特征对15类音效分类效果分别提升了28.0%、19.8%和6.7%。 A joint time-frequency audio feature extraction algorithm based on Gabor dictionary and weight tensor of sparse representation was proposed to describe the characteristic of non-stationary audio signal. Conventional sparse representation uses a predefined dictionary to encode the audio signal as sparse weight vector. In this paper,the elements in the weight vector were reorganized into tensor format. Each order of the tensor respectively characterized time,frequency and duration property of signal,making it the joint time-frequency-duration representation of the signal. The frequency factors and duration factors were concatenated as audio features through tensor decomposition. To solve the over-fitting problem of sparse tensor factorization,an automatic-adjust-penalty-coefficient factorization algorithm was proposed. The experimental results show that the proposed feature outperforms MFCC（ Mel-Frequency Cepstrum Coefficient） feature,MFCC ＋ MP feature concatenated by MFCC and Matching Pursuit（ MP） features,and nonuniform scale-frequency map feature by 28. 0%,19. 8% and 6. 7%respectively,in 15-category audio classification.

作者林静杨继臣张雪源李新超

机构地区茂名职业技术学院机电信息系华南理工大学电子与信息学院

出处《计算机应用》 CSCD 北大核心 2016年第5期1426-1429,1438,共5页 journal of Computer Applications

基金国家自然科学基金资助项目(61301300)~~

关键词稀疏表示张量因子分解音效分类时-频特征 sparse representation tensor factorization audio effect classification time-frequency feature

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献11

1ZUBAIR S, WANG W. Audio classification based on sparse coefficients[C]//Sensor Signal Processing for Defence (SSPD 2011. London, UK:The Institution of Engineering and Technology Press, 2011:1-5.
2ZUBAIR S, YAN F, WANG W. Dictionary learning based sparse coefficients for audio classification with max and average pooling[J]. Digital Signal Processing, 2013, 23(3):960-970.
3CHU S, NARAYANAN S, KUO C C J. Environmental sound recognition with time-frequency audio features[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2009, 17(6):1142-1158.
4SIVASANKARAN S, PRABHU K M M. Robust features for environmental sound classification[C]//Proceedings of the 2013 IEEE International Conference on Electronics, Computing and Communication Technologies. Piscataway, NJ:IEEE, 2013:1-6.
5WANG J C, LIN C H, CHEN B W, et al. Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation[J]. IEEE Transactions on Automation Science and Engineering, 2014, 11(2):607-613.
6TAKEUCHI K, ISHIGURO K, KIMURA A, et al. Non-negative multiple matrix factorization[C]//Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing:AAAI, 2013:1713-1720.
7LIU J, LIU J, WONKA P, et al. Sparse non-negative tensor factorization using columnwise coordinate descent[J]. Pattern Recognition, 2012, 45(1):649-656.
8CICHOCKI A, ZDUNEK R, PHAN A H, et al. Nonnegative Matrix and Tensor Factorizations:Applications to Exploratory Multi-way Data Analysis and Blind Source Separation[M]. New York:John Wiley & Sons, 2009:35-37.
9CHANG L H, WU J Y. An improved RIP-based performance guarantee for sparse signal recovery via orthogonal matching pursuit[J].IEEE Transactions on Information Theory, 2014, 60(9):5702-5715.
10Digital Juice, Incorporated. The digital juice sound FX library[DB/OL].[2015-05-20]. http://www.digitaljuice.com.