Non-blind audio bandwidth extension is a standard technique within contemporary audio codecs to efficiently code audio signals at low bitrates. In existing methods, in most cases high frequencies signal is usually gen...Non-blind audio bandwidth extension is a standard technique within contemporary audio codecs to efficiently code audio signals at low bitrates. In existing methods, in most cases high frequencies signal is usually generated by a duplication of the corresponding low frequencies and some parameters of high frequencies. However, the perception quality of coding will significantly degrade if the correlation between high frequencies and low frequencies becomes weak. In this paper, we quantitatively analyse the correlation via computing mutual information value. The analysis results show the correlation also exists in low frequency signal of the context dependent frames besides the current frame. In order to improve the perception quality of coding, we propose a novel method of high frequency coarse spectrum generation to improve the conventional replication method. In the proposed method, the coarse high frequency spectrums are generated by a nonlinear mapping model using deep recurrent neural network. The experiments confirm that the proposed method shows better performance than the reference methods.展开更多
The head-related transfer function(HRTF)involves the cues for human auditory localization,which turns it into an essential item of virtual auditory display technology.In practice,the interpolation of HRTF is necessary...The head-related transfer function(HRTF)involves the cues for human auditory localization,which turns it into an essential item of virtual auditory display technology.In practice,the interpolation of HRTF is necessary for the virtual auditory display systems to achieve high spatial resolution.Traditional geometric-based interpolation methods are generally restrained by the spatial distribution of reference on HRTF.When the spatial distribution is sparse,the accuracy of interpolation decreases significantly.Therefore,an interpolation method using the common-pole/zero model and the fitting neural network is proposed.First,we propose a common-pole/zero model to represent HRTFs across multiple subjects,in which the low-dimensional features of the measured HRTFs are extracted.Then,for a new spatial direction,we predict the corresponding low-dimensional HRTF with a fitting neural network.Finally,we reconstruct the high-dimensional HRTF from the predicted low-dimensional HRTF.The simulation results suggest that the proposed method outperforms other interpolation methods such as Linear_AMBC,Bilinear_AMBC,and the Combination method.展开更多
基金supported by the National Natural Science Foundation of China under Grant No. 61762005, 61231015, 61671335, 61702472, 61701194, 61761044, 61471271National High Technology Research and Development Program of China (863 Program) under Grant No. 2015AA016306+2 种基金 Hubei Province Technological Innovation Major Project under Grant No. 2016AAA015the Science Project of Education Department of Jiangxi Province under No. GJJ150585The Opening Project of Collaborative Innovation Center for Economics Crime Investigation and Prevention Technology, Jiangxi Province, under Grant No. JXJZXTCX-025
文摘Non-blind audio bandwidth extension is a standard technique within contemporary audio codecs to efficiently code audio signals at low bitrates. In existing methods, in most cases high frequencies signal is usually generated by a duplication of the corresponding low frequencies and some parameters of high frequencies. However, the perception quality of coding will significantly degrade if the correlation between high frequencies and low frequencies becomes weak. In this paper, we quantitatively analyse the correlation via computing mutual information value. The analysis results show the correlation also exists in low frequency signal of the context dependent frames besides the current frame. In order to improve the perception quality of coding, we propose a novel method of high frequency coarse spectrum generation to improve the conventional replication method. In the proposed method, the coarse high frequency spectrums are generated by a nonlinear mapping model using deep recurrent neural network. The experiments confirm that the proposed method shows better performance than the reference methods.
基金the National Key R&D Program of China(No.2017YFB1002803)National Nature Science Foundation of China(No.61801334,No.61761044)Basic Research Project of Science and Technology Plan of Shenzhen(JCYJ20170818143246278)。
文摘The head-related transfer function(HRTF)involves the cues for human auditory localization,which turns it into an essential item of virtual auditory display technology.In practice,the interpolation of HRTF is necessary for the virtual auditory display systems to achieve high spatial resolution.Traditional geometric-based interpolation methods are generally restrained by the spatial distribution of reference on HRTF.When the spatial distribution is sparse,the accuracy of interpolation decreases significantly.Therefore,an interpolation method using the common-pole/zero model and the fitting neural network is proposed.First,we propose a common-pole/zero model to represent HRTFs across multiple subjects,in which the low-dimensional features of the measured HRTFs are extracted.Then,for a new spatial direction,we predict the corresponding low-dimensional HRTF with a fitting neural network.Finally,we reconstruct the high-dimensional HRTF from the predicted low-dimensional HRTF.The simulation results suggest that the proposed method outperforms other interpolation methods such as Linear_AMBC,Bilinear_AMBC,and the Combination method.