0引言高阶Ambisonics声重放系统(High Order Ambisonics,HOA)采用空间球谐函数展开,逐级逼近理想声场的方法,记录原声场空间信息并重放一个与原声场相同或尽可能相似的三维声场[1]。重放声场的精度可以通过选择球谐展开级数来调节,重放...0引言高阶Ambisonics声重放系统(High Order Ambisonics,HOA)采用空间球谐函数展开,逐级逼近理想声场的方法,记录原声场空间信息并重放一个与原声场相同或尽可能相似的三维声场[1]。重放声场的精度可以通过选择球谐展开级数来调节,重放时,可有多种不同的扬声器布置方案。因此,HOA声重放系统的效果不仅与球谐函数的阶数有关。展开更多
在基于扬声器阵列的声回放技术中,相比于波场合成(Wave field synthesis,WFS)和矢量基幅度平移技术,以球谐分解为理论基础的Ambisonics回放系统拥有编解码相互独立以及可扩展等优势.基本的Ambisonics系统通过采集和回放一阶声场方位信息...在基于扬声器阵列的声回放技术中,相比于波场合成(Wave field synthesis,WFS)和矢量基幅度平移技术,以球谐分解为理论基础的Ambisonics回放系统拥有编解码相互独立以及可扩展等优势.基本的Ambisonics系统通过采集和回放一阶声场方位信息:全指向性(W)和双指向性(X,Y,Z)成分,即所谓的B-format重构声场.HOA(Higher order Ambisonics)基于声场球谐函数分解将B-format以更高空间分辨率进行了扩展.已有很多学者关注HOA的回放精度和使用限制,但综合考虑人头散射效应和人耳听觉特性效应的研究还十分缺乏.从低频人耳定位的重要因素——双耳时间差(Interaural time difference,ITD)的角度评价了不同阶数Ambisonis系统的最佳听音区域.将水平面各方向入射平面波编码为HOA分量,在此基础上计算了模拟人头(刚性球)在环形阵列内部移动时二维水平面的ITD波动,并通过ITD阈值确定最佳听音区域边界.仿真结果表明基于ITD的客观评价指标可以较好地体现不同阶数Ambisonics系统的声场回放性能:4阶Ambisoics系统能够使最佳听音区域达到20cm×14cm,而1,2阶系统在中心区域尚不能实现精确回放.因此,高阶Ambisoics系统拥有更好的声源定位性能.展开更多
Ambisonics is a series of spatial sound reproduction system based on spatial harmonics decomposition and each order approximation of sound field.Ambisonics signals are originally intended for loudspeakers reproduction...Ambisonics is a series of spatial sound reproduction system based on spatial harmonics decomposition and each order approximation of sound field.Ambisonics signals are originally intended for loudspeakers reproduction.By using head-related transfer functions(HRTFs)filters,binaural Ambisonics converts the Ambisonics signals for static or dynamic headphone reproduction.In present work,the performances of static and dynamic binaural Ambisonics reproduction are evaluated and compared.The mean binaural pressure errors across target source directions are first analyzed.Then a virtual source localization experiment is conducted,and the localization performances are evaluated by analyzing the percentages of front-back and up-down confusion,the mean angle error and discreteness in the localization results.The results indicate that binaural Ambsonics reproduction with insufficiently high order(for example,5-10 order)is unable to recreate correct high-frequency magnitude spectra in binaural pressures,resulting in degradation in localization for static reproduction.Because dynamic localization cue is included,dynamic binaural Ambisoncis reproduction yields obviously better localization performance than static reproduction with the same order.Even a 3-order dynamic binaural Ambisoncis reproduction exhibits appropriate localizations performance.展开更多
A scheme for analyzing the timbre in spatial sound with binaural auditory model is proposed and the Ambisonics is taken as an example for analysis. Ambisonics is a spatial sound system based on physical sound field re...A scheme for analyzing the timbre in spatial sound with binaural auditory model is proposed and the Ambisonics is taken as an example for analysis. Ambisonics is a spatial sound system based on physical sound field reconstruction. The errors and timbre colorations in the final reconstructed sound field depend on the spatial aliasing errors on both the recording and reproducing stages of Ambisonics. The binaural loudness level spectra in Ambisonics recon- struction is calculated by using Moore's revised loudness model and then compared with the result of real sound source, so as to evaluate the timbre coloration in Ambisonics quantitatively. The results indicate that, in the case of ideal 'independent signals, the high-frequency limit and radius of region without perceived timbre coloration increase with the order of Ambisonics. On the other hand, in the case of recording by microphone array, once the high-frequency limit of microphone array exceeds that of sound field reconstruction, array recording influences little on the binaural loudness level spectra and thus timbre in final reconstruction up to the high- frequency limit of reproduction. Based on the binaural auditory model analysis, a scheme for optimizing design of Ambisonics recording and reproduction is also suggested. The subjective experiment yields consistent results with those of binaural model, thus verifies the effectiveness of the model analysis.展开更多
Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for ma...Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources. In addition, the perceived sound varies from person to person even in the same sound field. Previous methods generally rely on individual-dependent head-related transferred function(HRTF)datasets and optimization algorithms that act on HRTFs. In practical applications, there are two major drawbacks to existing methods. The first is a high personalization cost, as traditional methods achieve personalized needs by measuring HRTFs. The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information. Therefore, it is desirable to develop novel techniques to achieve personalization and accuracy at a low cost. To this end, we focus on the binaural rendering of ambisonic and propose 1) channel-shared encoder and channel-compared attention integrated into neural networks and 2) a loss function quantifying interaural level differences to deal with spatial information. To verify the proposed method, we collect and release the first paired ambisonic-binaural dataset and introduce three metrics to evaluate the content information and spatial information accuracy of the end-to-end methods. Extensive experimental results on the collected dataset demonstrate the superior performance of the proposed method and the shortcomings of previous methods.展开更多
文摘0引言高阶Ambisonics声重放系统(High Order Ambisonics,HOA)采用空间球谐函数展开,逐级逼近理想声场的方法,记录原声场空间信息并重放一个与原声场相同或尽可能相似的三维声场[1]。重放声场的精度可以通过选择球谐展开级数来调节,重放时,可有多种不同的扬声器布置方案。因此,HOA声重放系统的效果不仅与球谐函数的阶数有关。
文摘在基于扬声器阵列的声回放技术中,相比于波场合成(Wave field synthesis,WFS)和矢量基幅度平移技术,以球谐分解为理论基础的Ambisonics回放系统拥有编解码相互独立以及可扩展等优势.基本的Ambisonics系统通过采集和回放一阶声场方位信息:全指向性(W)和双指向性(X,Y,Z)成分,即所谓的B-format重构声场.HOA(Higher order Ambisonics)基于声场球谐函数分解将B-format以更高空间分辨率进行了扩展.已有很多学者关注HOA的回放精度和使用限制,但综合考虑人头散射效应和人耳听觉特性效应的研究还十分缺乏.从低频人耳定位的重要因素——双耳时间差(Interaural time difference,ITD)的角度评价了不同阶数Ambisonis系统的最佳听音区域.将水平面各方向入射平面波编码为HOA分量,在此基础上计算了模拟人头(刚性球)在环形阵列内部移动时二维水平面的ITD波动,并通过ITD阈值确定最佳听音区域边界.仿真结果表明基于ITD的客观评价指标可以较好地体现不同阶数Ambisonics系统的声场回放性能:4阶Ambisoics系统能够使最佳听音区域达到20cm×14cm,而1,2阶系统在中心区域尚不能实现精确回放.因此,高阶Ambisoics系统拥有更好的声源定位性能.
基金This work was supported by the National Natural Science Foundation of China(11674105)State Key Lab of Subtropical Building Science,South China University of Technology.
文摘Ambisonics is a series of spatial sound reproduction system based on spatial harmonics decomposition and each order approximation of sound field.Ambisonics signals are originally intended for loudspeakers reproduction.By using head-related transfer functions(HRTFs)filters,binaural Ambisonics converts the Ambisonics signals for static or dynamic headphone reproduction.In present work,the performances of static and dynamic binaural Ambisonics reproduction are evaluated and compared.The mean binaural pressure errors across target source directions are first analyzed.Then a virtual source localization experiment is conducted,and the localization performances are evaluated by analyzing the percentages of front-back and up-down confusion,the mean angle error and discreteness in the localization results.The results indicate that binaural Ambsonics reproduction with insufficiently high order(for example,5-10 order)is unable to recreate correct high-frequency magnitude spectra in binaural pressures,resulting in degradation in localization for static reproduction.Because dynamic localization cue is included,dynamic binaural Ambisoncis reproduction yields obviously better localization performance than static reproduction with the same order.Even a 3-order dynamic binaural Ambisoncis reproduction exhibits appropriate localizations performance.
基金supported by the National Natural Science Foundation of China(11174087)
文摘A scheme for analyzing the timbre in spatial sound with binaural auditory model is proposed and the Ambisonics is taken as an example for analysis. Ambisonics is a spatial sound system based on physical sound field reconstruction. The errors and timbre colorations in the final reconstructed sound field depend on the spatial aliasing errors on both the recording and reproducing stages of Ambisonics. The binaural loudness level spectra in Ambisonics recon- struction is calculated by using Moore's revised loudness model and then compared with the result of real sound source, so as to evaluate the timbre coloration in Ambisonics quantitatively. The results indicate that, in the case of ideal 'independent signals, the high-frequency limit and radius of region without perceived timbre coloration increase with the order of Ambisonics. On the other hand, in the case of recording by microphone array, once the high-frequency limit of microphone array exceeds that of sound field reconstruction, array recording influences little on the binaural loudness level spectra and thus timbre in final reconstruction up to the high- frequency limit of reproduction. Based on the binaural auditory model analysis, a scheme for optimizing design of Ambisonics recording and reproduction is also suggested. The subjective experiment yields consistent results with those of binaural model, thus verifies the effectiveness of the model analysis.
基金supported in part by the National Natural Science Foundation of China (62176059, 62101136)。
文摘Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources. In addition, the perceived sound varies from person to person even in the same sound field. Previous methods generally rely on individual-dependent head-related transferred function(HRTF)datasets and optimization algorithms that act on HRTFs. In practical applications, there are two major drawbacks to existing methods. The first is a high personalization cost, as traditional methods achieve personalized needs by measuring HRTFs. The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information. Therefore, it is desirable to develop novel techniques to achieve personalization and accuracy at a low cost. To this end, we focus on the binaural rendering of ambisonic and propose 1) channel-shared encoder and channel-compared attention integrated into neural networks and 2) a loss function quantifying interaural level differences to deal with spatial information. To verify the proposed method, we collect and release the first paired ambisonic-binaural dataset and introduce three metrics to evaluate the content information and spatial information accuracy of the end-to-end methods. Extensive experimental results on the collected dataset demonstrate the superior performance of the proposed method and the shortcomings of previous methods.
文摘虚拟现实(Virtual Reality,VR)的兴起使得三维音频技术得到进一步的应用。VR中三维音频的回放一般采用基于双耳的方式,目前VR中应用较多的三维音频技术有基于物理声场重建和球谐分解的Ambisonics技术,基于自然双耳录音(Binaural recording)的技术,以及基于头相关传递函数(Head Related Transfer Function,HRTF)重建的技术。此外在考虑环境混响效果的场景下还需要双耳房间脉冲响应(Binaural Room Impulse Response,BRIR)技术。介绍了VR中现有的三维音频技术和市场上的主要应用,介绍了VR音频从采集,编码传输到渲染回放整个过程中的主流相关技术,最后对VR三维音频的发展进行了展望。