Audio Mixing Inversion via Embodied Self-supervised Learning

导出

摘要 Audio mixing is a crucial part of music production.For analyzing or recreating audio mixing,it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings,i.e.,audio mixing inversion.However,approaches of audio mixing inversion are rarely explored.A method of estimating mixing parameters from raw tracks and a stereo mixdown via embodied self-supervised learning is presented.In this work,several commonly used audio effects including gain,pan,equalization,reverb,and compression,are taken into consideration.This method is able to learn an inference neural network that takes a stereo mixdown and the raw audio sources as input and estimate mixing parameters used to create the mixdown by iteratively sampling and training.During the sampling step,the inference network predicts a set of mixing parameters,which is sampled and fed to an audio-processing framework to generate audio data for the training step.During the training step,the same network used in the sampling step is optimized with the sampled data generated from the sampling step.This method is able to explicitly model the mixing process in an interpretable way instead of using a black-box neural network model.A set of objective measures are used for evaluation.The experimental results show that this method has better performance than current state-of-the-art methods.

作者 Haotian Zhou Feng Yu Xihong Wu

机构地区 Department of AI Music and Music Information Technology Laboratory of Music Artificial Intelligence School of Intelligence Science and Technology

出处《Machine Intelligence Research》 EI CSCD 2024年第1期55-62,共8页 机器智能研究（英文版）

基金 This work was supported by High-grade,Precision and Advanced Discipline Construction Project of Beijing Universities,Major Projects of National Social Science Fund of China(No.21ZD19) Nation Culture and Tourism Technological Innovation Engineering Project of China.

关键词 Audio mixing inversion intelligent audio mixing self-supervised learning audio signal processing deep learning

分类号 TN912.3 [电子电信—通信与信息系统] TP39 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1Kai Zhang,Shengshuai Liu,Yingxuan Chen,Xutong Wang,Jietai Jing.Optical quantum states based on hot atomic ensembles and their applications[J].Photonics Insights,2022,1(2):57-98. 被引量：1
2Zahra Karami,Seyed-Abbas Yazdanfar,Saeid Norouzian-Maleki,Reza Khosrowabadi.Effects of square attractiveness on emotional perception,cognitive performance,and neurophysiology[J].Frontiers of Architectural Research,2023,12(6):1246-1259.
3Siddique Muhammad Kashif,孙林,李松晶.Computational and experimental investigations of a microfluidic mixer for efficient iodine extraction using carbon tetrachloride enhanced with gas bubbles[J].Chinese Physics B,2023,32(11):518-526.

Machine Intelligence Research

2024年第1期

浏览历史

内容加载中请稍等...

Audio Mixing Inversion via Embodied Self-supervised Learning

相关作者

相关机构

相关主题

浏览历史