Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for ma...Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources. In addition, the perceived sound varies from person to person even in the same sound field. Previous methods generally rely on individual-dependent head-related transferred function(HRTF)datasets and optimization algorithms that act on HRTFs. In practical applications, there are two major drawbacks to existing methods. The first is a high personalization cost, as traditional methods achieve personalized needs by measuring HRTFs. The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information. Therefore, it is desirable to develop novel techniques to achieve personalization and accuracy at a low cost. To this end, we focus on the binaural rendering of ambisonic and propose 1) channel-shared encoder and channel-compared attention integrated into neural networks and 2) a loss function quantifying interaural level differences to deal with spatial information. To verify the proposed method, we collect and release the first paired ambisonic-binaural dataset and introduce three metrics to evaluate the content information and spatial information accuracy of the end-to-end methods. Extensive experimental results on the collected dataset demonstrate the superior performance of the proposed method and the shortcomings of previous methods.展开更多
With the development of virtual reality (VR) technology, more and more industries are beginning to integrate with VR technology. In response to the problem of not being able to directly render the lighting effect of C...With the development of virtual reality (VR) technology, more and more industries are beginning to integrate with VR technology. In response to the problem of not being able to directly render the lighting effect of Caideng in digital Caideng scenes, this article analyzes the lighting model. It combines it with the lighting effect of Caideng scenes to design an optimized lighting model algorithm that fuses the bidirectional transmission distribution function (BTDF) model. This algorithm can efficiently render the lighting effect of Caideng models in a virtual environment. And using image optimization processing methods, the immersive experience effect on the VR is enhanced. Finally, a Caideng roaming interactive system was designed based on this method. The results show that the frame rate of the system is stable during operation, maintained above 60 fps, and has a good immersive experience.展开更多
为丰富学生的声乐学习资源、节省教师人力资源,本研究提出将虚拟现实技术(Virtual Reality,VR)与声乐演唱相结合,构建一个新的虚拟教学系统。首先构建一个虚拟学习系统;接着提出一种基于Log-Gabor-改进局部二值模式(Improved Local Bina...为丰富学生的声乐学习资源、节省教师人力资源,本研究提出将虚拟现实技术(Virtual Reality,VR)与声乐演唱相结合,构建一个新的虚拟教学系统。首先构建一个虚拟学习系统;接着提出一种基于Log-Gabor-改进局部二值模式(Improved Local Binary Pattern,ILBP)的语谱图特征算法,对不同尺寸与方向的语谱细节特征进行放大,增强图像纹理细节特征;最后引入多级残差结构-ICNN以弥补丢失的特征,提高对演唱者语音的识别率。结果显示,在Saarbruecken数据集与CASIA数据集中,研究算法的收敛指标值均最小,具有较高的收敛性;窗长为600、谱窗尺寸大小为16×16时,模型有最高语音识别效率。该算法对于演唱者的情感识别效果较好,准确率均高于80%。以上结果均表明,本次研究的算法识别准确率高,稳定性好,能够较好地运用于声乐演唱教育教学中。展开更多
基金supported in part by the National Natural Science Foundation of China (62176059, 62101136)。
文摘Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources. In addition, the perceived sound varies from person to person even in the same sound field. Previous methods generally rely on individual-dependent head-related transferred function(HRTF)datasets and optimization algorithms that act on HRTFs. In practical applications, there are two major drawbacks to existing methods. The first is a high personalization cost, as traditional methods achieve personalized needs by measuring HRTFs. The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information. Therefore, it is desirable to develop novel techniques to achieve personalization and accuracy at a low cost. To this end, we focus on the binaural rendering of ambisonic and propose 1) channel-shared encoder and channel-compared attention integrated into neural networks and 2) a loss function quantifying interaural level differences to deal with spatial information. To verify the proposed method, we collect and release the first paired ambisonic-binaural dataset and introduce three metrics to evaluate the content information and spatial information accuracy of the end-to-end methods. Extensive experimental results on the collected dataset demonstrate the superior performance of the proposed method and the shortcomings of previous methods.
文摘With the development of virtual reality (VR) technology, more and more industries are beginning to integrate with VR technology. In response to the problem of not being able to directly render the lighting effect of Caideng in digital Caideng scenes, this article analyzes the lighting model. It combines it with the lighting effect of Caideng scenes to design an optimized lighting model algorithm that fuses the bidirectional transmission distribution function (BTDF) model. This algorithm can efficiently render the lighting effect of Caideng models in a virtual environment. And using image optimization processing methods, the immersive experience effect on the VR is enhanced. Finally, a Caideng roaming interactive system was designed based on this method. The results show that the frame rate of the system is stable during operation, maintained above 60 fps, and has a good immersive experience.
文摘为丰富学生的声乐学习资源、节省教师人力资源,本研究提出将虚拟现实技术(Virtual Reality,VR)与声乐演唱相结合,构建一个新的虚拟教学系统。首先构建一个虚拟学习系统;接着提出一种基于Log-Gabor-改进局部二值模式(Improved Local Binary Pattern,ILBP)的语谱图特征算法,对不同尺寸与方向的语谱细节特征进行放大,增强图像纹理细节特征;最后引入多级残差结构-ICNN以弥补丢失的特征,提高对演唱者语音的识别率。结果显示,在Saarbruecken数据集与CASIA数据集中,研究算法的收敛指标值均最小,具有较高的收敛性;窗长为600、谱窗尺寸大小为16×16时,模型有最高语音识别效率。该算法对于演唱者的情感识别效果较好,准确率均高于80%。以上结果均表明,本次研究的算法识别准确率高,稳定性好,能够较好地运用于声乐演唱教育教学中。