Audio mixing is a crucial part of music production.For analyzing or recreating audio mixing,it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings,i...Audio mixing is a crucial part of music production.For analyzing or recreating audio mixing,it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings,i.e.,audio mixing inversion.However,approaches of audio mixing inversion are rarely explored.A method of estimating mixing parameters from raw tracks and a stereo mixdown via embodied self-supervised learning is presented.In this work,several commonly used audio effects including gain,pan,equalization,reverb,and compression,are taken into consideration.This method is able to learn an inference neural network that takes a stereo mixdown and the raw audio sources as input and estimate mixing parameters used to create the mixdown by iteratively sampling and training.During the sampling step,the inference network predicts a set of mixing parameters,which is sampled and fed to an audio-processing framework to generate audio data for the training step.During the training step,the same network used in the sampling step is optimized with the sampled data generated from the sampling step.This method is able to explicitly model the mixing process in an interpretable way instead of using a black-box neural network model.A set of objective measures are used for evaluation.The experimental results show that this method has better performance than current state-of-the-art methods.展开更多
In multimedia conference, the capability of audio processing is basic and requires more for real-time criteria. In this article, we categorize and analyze the schemes, and provide several multipoint speech audio mixin...In multimedia conference, the capability of audio processing is basic and requires more for real-time criteria. In this article, we categorize and analyze the schemes, and provide several multipoint speech audio mixing schemes using weighted algorithm, which meet the demand of practical needs for real-time multipoint speech mixing, for which the ASW and AEW schemes are especially recommended. Applying the adaptive algorithms, the high-performance schemes we provide do not use the saturation operation widely used in multimedia processing. Therefore, no additional noise will be added to the output. The above adaptive algorithms have relatively low computational complexity and good hearing perceptibility. The schemes are designed for parallel processing, and can be easily implemented with hardware, such as DSPs, and widely applied in multimedia conference systems.展开更多
基金This work was supported by High-grade,Precision and Advanced Discipline Construction Project of Beijing Universities,Major Projects of National Social Science Fund of China(No.21ZD19)Nation Culture and Tourism Technological Innovation Engineering Project of China.
文摘Audio mixing is a crucial part of music production.For analyzing or recreating audio mixing,it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings,i.e.,audio mixing inversion.However,approaches of audio mixing inversion are rarely explored.A method of estimating mixing parameters from raw tracks and a stereo mixdown via embodied self-supervised learning is presented.In this work,several commonly used audio effects including gain,pan,equalization,reverb,and compression,are taken into consideration.This method is able to learn an inference neural network that takes a stereo mixdown and the raw audio sources as input and estimate mixing parameters used to create the mixdown by iteratively sampling and training.During the sampling step,the inference network predicts a set of mixing parameters,which is sampled and fed to an audio-processing framework to generate audio data for the training step.During the training step,the same network used in the sampling step is optimized with the sampled data generated from the sampling step.This method is able to explicitly model the mixing process in an interpretable way instead of using a black-box neural network model.A set of objective measures are used for evaluation.The experimental results show that this method has better performance than current state-of-the-art methods.
文摘In multimedia conference, the capability of audio processing is basic and requires more for real-time criteria. In this article, we categorize and analyze the schemes, and provide several multipoint speech audio mixing schemes using weighted algorithm, which meet the demand of practical needs for real-time multipoint speech mixing, for which the ASW and AEW schemes are especially recommended. Applying the adaptive algorithms, the high-performance schemes we provide do not use the saturation operation widely used in multimedia processing. Therefore, no additional noise will be added to the output. The above adaptive algorithms have relatively low computational complexity and good hearing perceptibility. The schemes are designed for parallel processing, and can be easily implemented with hardware, such as DSPs, and widely applied in multimedia conference systems.