Generating compact and effective numerical representations of data is a fundamental step for many machine learning tasks.Traditionally,handcrafted features are used but as deep learning starts to show its potential,us...Generating compact and effective numerical representations of data is a fundamental step for many machine learning tasks.Traditionally,handcrafted features are used but as deep learning starts to show its potential,using deep learning models to extract compact representations becomes a new trend.Among them,adopting vectors from the model’s latent space is the most popular.There are several studies focused on visual analysis of latent space in NLP and computer vision.However,relatively little work has been done for music information retrieval(MIR)especially incorporating visualization.To bridge this gap,we propose a visual analysis system utilizing Autoencoders to facilitate analysis and exploration of traditional Chinese music.Due to the lack of proper traditional Chinese music data,we construct a labeled dataset from a collection of pre-recorded audios and then convert them into spectrograms.Our system takes music features learned from two deep learning models(a fully-connected Autoencoder and a Long Short-Term Memory(LSTM)Autoencoder)as input.Through interactive selection,similarity calculation,clustering and listening,we show that the latent representations of the encoded data allow our system to identify essential music elements,which lay the foundation for further analysis and retrieval of Chinese music in the future.展开更多
基金US Department of Energy Los Alamos National Laboratory contract 47145 and UT-Battelle LLC contract 4000159447 program manager Laura Biven.
文摘Generating compact and effective numerical representations of data is a fundamental step for many machine learning tasks.Traditionally,handcrafted features are used but as deep learning starts to show its potential,using deep learning models to extract compact representations becomes a new trend.Among them,adopting vectors from the model’s latent space is the most popular.There are several studies focused on visual analysis of latent space in NLP and computer vision.However,relatively little work has been done for music information retrieval(MIR)especially incorporating visualization.To bridge this gap,we propose a visual analysis system utilizing Autoencoders to facilitate analysis and exploration of traditional Chinese music.Due to the lack of proper traditional Chinese music data,we construct a labeled dataset from a collection of pre-recorded audios and then convert them into spectrograms.Our system takes music features learned from two deep learning models(a fully-connected Autoencoder and a Long Short-Term Memory(LSTM)Autoencoder)as input.Through interactive selection,similarity calculation,clustering and listening,we show that the latent representations of the encoded data allow our system to identify essential music elements,which lay the foundation for further analysis and retrieval of Chinese music in the future.
基金the National Hi-Tech Research and Development Program of China (No.2008AA1Z207)Natural Science Foundation of Hubei Province,China (No.2010CDB01606)+2 种基金Fundamental Research Funds for the Central Universities (HUST: 2016YXMS027)Huawei Innovation Research Program (Nos. YJCB2010032NW,YB2012120133,YB2014010026and YB2016040002)and Scientific Research Foundation for the Returned Overseas Chinese Scholars.