With the continuous development of deep learning and artificial neural networks(ANNs), algorithmic composition has gradually become a hot research field. In order to solve the music-style problem in generating chord m...With the continuous development of deep learning and artificial neural networks(ANNs), algorithmic composition has gradually become a hot research field. In order to solve the music-style problem in generating chord music, a multi-style chord music generation(MSCMG) network is proposed based on the previous ANN for creation. A music-style extraction module and a style extractor are added by the network on the original basis;the music-style extraction module divides the entire music content into two parts, namely the music-style information Mstyleand the music content information Mcontent. The style extractor removes the music-style information entangled in the music content information. The similarity of music generated by different models is compared in this paper. It is also evaluated whether the model can learn music composition rules from the database. Through experiments, it is found that the model proposed in this paper can generate music works in the expected style. Compared with the long short term memory(LSTM) network, the MSCMG network has a certain improvement in the performance of music styles.展开更多
Dance-driven music generation aims to generate musical pieces conditioned on dance videos.Previous works focus on monophonic or raw audio generation,while the multi-instrument scenario is under-explored.The challenges...Dance-driven music generation aims to generate musical pieces conditioned on dance videos.Previous works focus on monophonic or raw audio generation,while the multi-instrument scenario is under-explored.The challenges associated with dancedriven multi-instrument music(MIDI)generation are twofold:(i)lack of a publicly available multi-instrument MIDI and video paired dataset and(ii)the weak correlation between music and video.To tackle these challenges,we have built the first multi-instrument MIDI and dance paired dataset(D2MIDI).Based on this dataset,we introduce a multi-instrument MIDI generation framework(Dance2MIDI)conditioned on dance video.Firstly,to capture the relationship between dance and music,we employ a graph convolutional network to encode the dance motion.This allows us to extract features related to dance movement and dance style.Secondly,to generate a harmonious rhythm,we utilize a transformer model to decode the drum track sequence,leveraging a cross-attention mechanism.Thirdly,we model the task of generating the remaining tracks based on the drum track as a sequence understanding and completion task.A BERTlike model is employed to comprehend the context of the entire music piece through self-supervised learning.We evaluate the music generated by our framework trained on the D2MIDI dataset and demonstrate that our method achieves state-of-the-art performance.展开更多
Recently,various algorithms have been developed for generating appealing music.However,the style control in the generation process has been somewhat overlooked.Music style refers to the representative and unique appea...Recently,various algorithms have been developed for generating appealing music.However,the style control in the generation process has been somewhat overlooked.Music style refers to the representative and unique appearance presented by a musical work,and it is one of the most salient qualities of music.In this paper,we propose an innovative music generation algorithm capable of creating a complete musical composition from scratch based on a specified target style.A style-conditioned linear Transformer and a style-conditioned patch discriminator are introduced in the model.The style-conditioned linear Transformer models musical instrument digital interface(MIDI)event sequences and emphasizes the role of style information.Simultaneously,the style-conditioned patch discriminator applies an adversarial learning mechanism with two innovative loss functions to enhance the modeling of music sequences.Moreover,we establish a discriminative metric for the first time,enabling the evaluation of the generated music’s consistency concerning music styles.Both objective and subjective evaluations of our experimental results indicate that our method’s performance with regard to music production is better than the performances encountered in the case of music production with the use of state-of-the-art methods in available public datasets.展开更多
基金National Natural Science Foundation of China (No.61801106)。
文摘With the continuous development of deep learning and artificial neural networks(ANNs), algorithmic composition has gradually become a hot research field. In order to solve the music-style problem in generating chord music, a multi-style chord music generation(MSCMG) network is proposed based on the previous ANN for creation. A music-style extraction module and a style extractor are added by the network on the original basis;the music-style extraction module divides the entire music content into two parts, namely the music-style information Mstyleand the music content information Mcontent. The style extractor removes the music-style information entangled in the music content information. The similarity of music generated by different models is compared in this paper. It is also evaluated whether the model can learn music composition rules from the database. Through experiments, it is found that the model proposed in this paper can generate music works in the expected style. Compared with the long short term memory(LSTM) network, the MSCMG network has a certain improvement in the performance of music styles.
基金supported by the National Social Science Foundation Art Project(No.20BC040)China Scholarship Council(CSC)Grant(No.202306320525).
文摘Dance-driven music generation aims to generate musical pieces conditioned on dance videos.Previous works focus on monophonic or raw audio generation,while the multi-instrument scenario is under-explored.The challenges associated with dancedriven multi-instrument music(MIDI)generation are twofold:(i)lack of a publicly available multi-instrument MIDI and video paired dataset and(ii)the weak correlation between music and video.To tackle these challenges,we have built the first multi-instrument MIDI and dance paired dataset(D2MIDI).Based on this dataset,we introduce a multi-instrument MIDI generation framework(Dance2MIDI)conditioned on dance video.Firstly,to capture the relationship between dance and music,we employ a graph convolutional network to encode the dance motion.This allows us to extract features related to dance movement and dance style.Secondly,to generate a harmonious rhythm,we utilize a transformer model to decode the drum track sequence,leveraging a cross-attention mechanism.Thirdly,we model the task of generating the remaining tracks based on the drum track as a sequence understanding and completion task.A BERTlike model is employed to comprehend the context of the entire music piece through self-supervised learning.We evaluate the music generated by our framework trained on the D2MIDI dataset and demonstrate that our method achieves state-of-the-art performance.
基金Project supported by the Natural Science Foundation of Guangdong Province in China(No.2021A1515011888)。
文摘Recently,various algorithms have been developed for generating appealing music.However,the style control in the generation process has been somewhat overlooked.Music style refers to the representative and unique appearance presented by a musical work,and it is one of the most salient qualities of music.In this paper,we propose an innovative music generation algorithm capable of creating a complete musical composition from scratch based on a specified target style.A style-conditioned linear Transformer and a style-conditioned patch discriminator are introduced in the model.The style-conditioned linear Transformer models musical instrument digital interface(MIDI)event sequences and emphasizes the role of style information.Simultaneously,the style-conditioned patch discriminator applies an adversarial learning mechanism with two innovative loss functions to enhance the modeling of music sequences.Moreover,we establish a discriminative metric for the first time,enabling the evaluation of the generated music’s consistency concerning music styles.Both objective and subjective evaluations of our experimental results indicate that our method’s performance with regard to music production is better than the performances encountered in the case of music production with the use of state-of-the-art methods in available public datasets.