ABSTRACT
With the development of knowledge of music composition and the recent increase in demand, an increasing number of companies and research institutes have begun to study the automatic generation of music. However, previous models have limitations when applying to song generation, which requires both the melody and arrangement. Besides, many critical factors related to the quality of a song such as chord progression and rhythm patterns are not well addressed. In particular, the problem of how to ensure the harmony of multi-track music is still underexplored. To this end, we present a focused study on pop music generation, in which we take both chord and rhythm influence of melody generation and the harmony of music arrangement into consideration. We propose an end-to-end melody and arrangement generation framework, called XiaoIce Band, which generates a melody track with several accompany tracks played by several types of instruments. Specifically, we devise a Chord based Rhythm and Melody Cross-Generation Model (CRMCG) to generate melody with chord progressions. Then, we propose a Multi-Instrument Co-Arrangement Model (MICA) using multi-task learning for multi-track music arrangement. Finally, we conduct extensive experiments on a real-world dataset, where the results demonstrate the effectiveness of XiaoIce Band.
Supplemental Material
- Léon Bottou . 2010. Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT'2010. Springer, 177--186.Google ScholarCross Ref
- Mason Bretan, Gil Weinberg, and Larry Heck . 2016. A Unit Selection Methodology for Music Generation Using Deep Neural Networks. arXiv preprint arXiv:1612.03789 (2016).Google Scholar
- Pietro Casella and Ana Paiva . 2001. Magenta: An architecture for real time automatic composition of background music International Workshop on Intelligent Virtual Agents. Springer, 224--232. Google ScholarDigital Library
- Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio . 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014).Google Scholar
- Parag Chordia, Avinash Sastry, and Sertan cSentürk . 2011. Predictive tabla modelling using variable-length markov and hidden markov models. Journal of New Music Research Vol. 40, 2 (2011), 105--118.Google ScholarCross Ref
- Hang Chu, Raquel Urtasun, and Sanja Fidler . 2016. Song from pi: A musically plausible network for pop music generation. arXiv preprint arXiv:1611.03477 (2016).Google Scholar
- Ronan Collobert and Jason Weston . 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning. ACM, 160--167. Google ScholarDigital Library
- Darrell Conklin . 2003. Music generation from statistical models. In Proceedings of the AISB 2003 Symposium on Artificial Intelligence and Creativity in the Arts and Sciences. Citeseer, 30--35.Google Scholar
- Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang . 2015. Multi-Task Learning for Multiple Language Translation. ACL (1). 1723--1732.Google Scholar
- Ross Girshick . 2015. Fast r-cnn Proceedings of the IEEE international conference on computer vision. 1440--1448. Google ScholarDigital Library
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio . 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680. Google ScholarDigital Library
- Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton . 2013. Speech recognition with deep recurrent neural networks Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, 6645--6649.Google Scholar
- Gaëtan Hadjeres and Franccois Pachet . 2016. DeepBach: a Steerable Model for Bach chorales generation. arXiv preprint arXiv:1612.01010 (2016).Google Scholar
- Christopher Harte, Mark Sandler, and Martin Gasser . 2006. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM workshop on Audio and music computing multimedia. ACM, 21--26. Google ScholarDigital Library
- Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher . 2016. A joint many-task model: Growing a neural network for multiple NLP tasks. arXiv preprint arXiv:1611.01587 (2016).Google Scholar
- Nanzhu Jiang, Peter Grosche, Verena Konz, and Meinard Müller . 2011. Analyzing chroma feature types for automated chord recognition Audio Engineering Society Conference: 42nd International Conference: Semantic Audio. Audio Engineering Society.Google Scholar
- Daniel Johnson . 2015. Composing music with recurrent neural networks. (2015).Google Scholar
- Alex Kendall, Yarin Gal, and Roberto Cipolla . 2017. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. arXiv preprint arXiv:1705.07115 (2017).Google Scholar
- Diederik P Kingma and Max Welling . 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).Google Scholar
- Vladimir I Levenshtein . 1966. Binary codes capable of correcting deletions, insertions, and reversals Soviet physics doklady, Vol. Vol. 10. 707--710.Google Scholar
- Pengfei Liu, Xipeng Qiu, and Xuanjing Huang . 2016. Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016). Google ScholarDigital Library
- Mingsheng Long and Jianmin Wang . 2015. Learning multiple tasks with deep relationship networks. arXiv preprint arXiv:1506.02117 (2015).Google ScholarDigital Library
- Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert . 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3994--4003.Google ScholarCross Ref
- Olof Mogren . 2016. C-RNN-GAN: Continuous recurrent neural networks with adversarial training. arXiv preprint arXiv:1611.09904 (2016).Google Scholar
- Franccois Pachet, Sony CSL Paris, Alexandre Papadopoulos, and Pierre Roy . 2017. Sampling variations of sequences for structured music generation Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR'2017), Suzhou, China. 167--173.Google Scholar
- Franccois Pachet and Pierre Roy . 2011. Markov constraints: steerable generation of Markov sequences. Constraints, Vol. 16, 2 (2011), 148--172. Google ScholarDigital Library
- Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, and Anders Søgaard . 2017. Sluice networks: Learning what to share between loosely related tasks. arXiv preprint arXiv:1705.08142 (2017).Google Scholar
- Romain Sabathé, Eduardo Coutinho, and Björn Schuller . 2017. Deep recurrent music writer: Memory-enhanced variational autoencoder-based musical score composition and an objective measure Neural Networks (IJCNN), 2017 International Joint Conference on. IEEE, 3467--3474.Google ScholarCross Ref
- Paul Schmeling . 2011. Berklee Music Theory. Berklee Press.Google Scholar
- Heung-Yeung Shum, Xiaodong He, and Di Li . 2018. From Eliza to XiaoIce: Challenges and Opportunities with Social Chatbots. arXiv preprint arXiv:1801.01957 (2018).Google Scholar
- Andries Van Der Merwe and Walter Schulze . 2011. Music generation with Markov models. IEEE MultiMedia, Vol. 18, 3 (2011), 78--85. Google ScholarDigital Library
- Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang . 2017. MidiNet: A convolutional generative adversarial network for symbolic-domain music generation Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR'2017), Suzhou, China.Google Scholar
- Xiaofan Zhang, Feng Zhou, Yuanqing Lin, and Shaoting Zhang . 2016. Embedding label structures for fine-grained feature representation Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1114--1123.Google Scholar
- Yu Zhang and Qiang Yang . 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114 (2017).Google Scholar
Index Terms
- XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music
Recommendations
Pop Music Generation: From Melody to Multi-style Arrangement
Special Issue on KDD 2018, Regular Papers and Survey PaperMusic plays an important role in our daily life. With the development of deep learning and modern generation techniques, researchers have done plenty of works on automatic music generation. However, due to the special requirements of both melody and ...
PopMAG: Pop Music Accompaniment Generation
MM '20: Proceedings of the 28th ACM International Conference on MultimediaIn pop music, accompaniments are usually played by multiple instruments (tracks) such as drum, bass, string and guitar, and can make a song more expressive and contagious by arranging together with its melody. Previous works usually generate multiple ...
Structure-Enhanced Pop Music Generation via Harmony-Aware Learning
MM '22: Proceedings of the 30th ACM International Conference on MultimediaPop music generation has always been an attractive topic for both musicians and scientists for a long time. However, automatically composing pop music with a satisfactory structure is still a challenging issue. In this paper, we propose to leverage ...
Comments