Generating compact and effective numerical representations of data is a fundamental step for many machine learning tasks.Traditionally,handcrafted features are used but as deep learning starts to show its potential,us...Generating compact and effective numerical representations of data is a fundamental step for many machine learning tasks.Traditionally,handcrafted features are used but as deep learning starts to show its potential,using deep learning models to extract compact representations becomes a new trend.Among them,adopting vectors from the model’s latent space is the most popular.There are several studies focused on visual analysis of latent space in NLP and computer vision.However,relatively little work has been done for music information retrieval(MIR)especially incorporating visualization.To bridge this gap,we propose a visual analysis system utilizing Autoencoders to facilitate analysis and exploration of traditional Chinese music.Due to the lack of proper traditional Chinese music data,we construct a labeled dataset from a collection of pre-recorded audios and then convert them into spectrograms.Our system takes music features learned from two deep learning models(a fully-connected Autoencoder and a Long Short-Term Memory(LSTM)Autoencoder)as input.Through interactive selection,similarity calculation,clustering and listening,we show that the latent representations of the encoded data allow our system to identify essential music elements,which lay the foundation for further analysis and retrieval of Chinese music in the future.展开更多
In recent years,convolutional neural networks(CNNs)become popular approaches used in music information retrieval(MIR)tasks,such as mood recognition,music auto-tagging and so on.Since CNNs are able to extract the local...In recent years,convolutional neural networks(CNNs)become popular approaches used in music information retrieval(MIR)tasks,such as mood recognition,music auto-tagging and so on.Since CNNs are able to extract the local features effectively,previous attempts show great performance on music auto-tagging.However,CNNs is not able to capture the spatial features and the relationship between low-level features are neglected.Motivated by this problem,a hybrid architecture is proposed based on Capsule Network,which is capable to extract spatial features with the routing-by-agreement mechanism.The proposed model was applied in music auto-tagging.The results show that it achieves promising results of the ROC-AUC score of 90.67%.展开更多
基金US Department of Energy Los Alamos National Laboratory contract 47145 and UT-Battelle LLC contract 4000159447 program manager Laura Biven.
文摘Generating compact and effective numerical representations of data is a fundamental step for many machine learning tasks.Traditionally,handcrafted features are used but as deep learning starts to show its potential,using deep learning models to extract compact representations becomes a new trend.Among them,adopting vectors from the model’s latent space is the most popular.There are several studies focused on visual analysis of latent space in NLP and computer vision.However,relatively little work has been done for music information retrieval(MIR)especially incorporating visualization.To bridge this gap,we propose a visual analysis system utilizing Autoencoders to facilitate analysis and exploration of traditional Chinese music.Due to the lack of proper traditional Chinese music data,we construct a labeled dataset from a collection of pre-recorded audios and then convert them into spectrograms.Our system takes music features learned from two deep learning models(a fully-connected Autoencoder and a Long Short-Term Memory(LSTM)Autoencoder)as input.Through interactive selection,similarity calculation,clustering and listening,we show that the latent representations of the encoded data allow our system to identify essential music elements,which lay the foundation for further analysis and retrieval of Chinese music in the future.
基金Research Fund for Sichuan Science and Technology Program(GrantNo.2019YFG0190)Research on Sino-Tibetan multisource information acquisition,fusion,data mining and its application(Grant No.H04W170186).
文摘In recent years,convolutional neural networks(CNNs)become popular approaches used in music information retrieval(MIR)tasks,such as mood recognition,music auto-tagging and so on.Since CNNs are able to extract the local features effectively,previous attempts show great performance on music auto-tagging.However,CNNs is not able to capture the spatial features and the relationship between low-level features are neglected.Motivated by this problem,a hybrid architecture is proposed based on Capsule Network,which is capable to extract spatial features with the routing-by-agreement mechanism.The proposed model was applied in music auto-tagging.The results show that it achieves promising results of the ROC-AUC score of 90.67%.