skip to main content
10.1145/3219819.3220105acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music

Authors Info & Claims
Published:19 July 2018Publication History

ABSTRACT

With the development of knowledge of music composition and the recent increase in demand, an increasing number of companies and research institutes have begun to study the automatic generation of music. However, previous models have limitations when applying to song generation, which requires both the melody and arrangement. Besides, many critical factors related to the quality of a song such as chord progression and rhythm patterns are not well addressed. In particular, the problem of how to ensure the harmony of multi-track music is still underexplored. To this end, we present a focused study on pop music generation, in which we take both chord and rhythm influence of melody generation and the harmony of music arrangement into consideration. We propose an end-to-end melody and arrangement generation framework, called XiaoIce Band, which generates a melody track with several accompany tracks played by several types of instruments. Specifically, we devise a Chord based Rhythm and Melody Cross-Generation Model (CRMCG) to generate melody with chord progressions. Then, we propose a Multi-Instrument Co-Arrangement Model (MICA) using multi-task learning for multi-track music arrangement. Finally, we conduct extensive experiments on a real-world dataset, where the results demonstrate the effectiveness of XiaoIce Band.

Skip Supplemental Material Section

Supplemental Material

zhu_pop_music.mp4

mp4

332.4 MB

References

  1. Léon Bottou . 2010. Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT'2010. Springer, 177--186.Google ScholarGoogle ScholarCross RefCross Ref
  2. Mason Bretan, Gil Weinberg, and Larry Heck . 2016. A Unit Selection Methodology for Music Generation Using Deep Neural Networks. arXiv preprint arXiv:1612.03789 (2016).Google ScholarGoogle Scholar
  3. Pietro Casella and Ana Paiva . 2001. Magenta: An architecture for real time automatic composition of background music International Workshop on Intelligent Virtual Agents. Springer, 224--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio . 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014).Google ScholarGoogle Scholar
  5. Parag Chordia, Avinash Sastry, and Sertan cSentürk . 2011. Predictive tabla modelling using variable-length markov and hidden markov models. Journal of New Music Research Vol. 40, 2 (2011), 105--118.Google ScholarGoogle ScholarCross RefCross Ref
  6. Hang Chu, Raquel Urtasun, and Sanja Fidler . 2016. Song from pi: A musically plausible network for pop music generation. arXiv preprint arXiv:1611.03477 (2016).Google ScholarGoogle Scholar
  7. Ronan Collobert and Jason Weston . 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning. ACM, 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Darrell Conklin . 2003. Music generation from statistical models. In Proceedings of the AISB 2003 Symposium on Artificial Intelligence and Creativity in the Arts and Sciences. Citeseer, 30--35.Google ScholarGoogle Scholar
  9. Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang . 2015. Multi-Task Learning for Multiple Language Translation. ACL (1). 1723--1732.Google ScholarGoogle Scholar
  10. Ross Girshick . 2015. Fast r-cnn Proceedings of the IEEE international conference on computer vision. 1440--1448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio . 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton . 2013. Speech recognition with deep recurrent neural networks Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, 6645--6649.Google ScholarGoogle Scholar
  13. Gaëtan Hadjeres and Franccois Pachet . 2016. DeepBach: a Steerable Model for Bach chorales generation. arXiv preprint arXiv:1612.01010 (2016).Google ScholarGoogle Scholar
  14. Christopher Harte, Mark Sandler, and Martin Gasser . 2006. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM workshop on Audio and music computing multimedia. ACM, 21--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher . 2016. A joint many-task model: Growing a neural network for multiple NLP tasks. arXiv preprint arXiv:1611.01587 (2016).Google ScholarGoogle Scholar
  16. Nanzhu Jiang, Peter Grosche, Verena Konz, and Meinard Müller . 2011. Analyzing chroma feature types for automated chord recognition Audio Engineering Society Conference: 42nd International Conference: Semantic Audio. Audio Engineering Society.Google ScholarGoogle Scholar
  17. Daniel Johnson . 2015. Composing music with recurrent neural networks. (2015).Google ScholarGoogle Scholar
  18. Alex Kendall, Yarin Gal, and Roberto Cipolla . 2017. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. arXiv preprint arXiv:1705.07115 (2017).Google ScholarGoogle Scholar
  19. Diederik P Kingma and Max Welling . 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).Google ScholarGoogle Scholar
  20. Vladimir I Levenshtein . 1966. Binary codes capable of correcting deletions, insertions, and reversals Soviet physics doklady, Vol. Vol. 10. 707--710.Google ScholarGoogle Scholar
  21. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang . 2016. Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mingsheng Long and Jianmin Wang . 2015. Learning multiple tasks with deep relationship networks. arXiv preprint arXiv:1506.02117 (2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert . 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3994--4003.Google ScholarGoogle ScholarCross RefCross Ref
  24. Olof Mogren . 2016. C-RNN-GAN: Continuous recurrent neural networks with adversarial training. arXiv preprint arXiv:1611.09904 (2016).Google ScholarGoogle Scholar
  25. Franccois Pachet, Sony CSL Paris, Alexandre Papadopoulos, and Pierre Roy . 2017. Sampling variations of sequences for structured music generation Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR'2017), Suzhou, China. 167--173.Google ScholarGoogle Scholar
  26. Franccois Pachet and Pierre Roy . 2011. Markov constraints: steerable generation of Markov sequences. Constraints, Vol. 16, 2 (2011), 148--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, and Anders Søgaard . 2017. Sluice networks: Learning what to share between loosely related tasks. arXiv preprint arXiv:1705.08142 (2017).Google ScholarGoogle Scholar
  28. Romain Sabathé, Eduardo Coutinho, and Björn Schuller . 2017. Deep recurrent music writer: Memory-enhanced variational autoencoder-based musical score composition and an objective measure Neural Networks (IJCNN), 2017 International Joint Conference on. IEEE, 3467--3474.Google ScholarGoogle ScholarCross RefCross Ref
  29. Paul Schmeling . 2011. Berklee Music Theory. Berklee Press.Google ScholarGoogle Scholar
  30. Heung-Yeung Shum, Xiaodong He, and Di Li . 2018. From Eliza to XiaoIce: Challenges and Opportunities with Social Chatbots. arXiv preprint arXiv:1801.01957 (2018).Google ScholarGoogle Scholar
  31. Andries Van Der Merwe and Walter Schulze . 2011. Music generation with Markov models. IEEE MultiMedia, Vol. 18, 3 (2011), 78--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang . 2017. MidiNet: A convolutional generative adversarial network for symbolic-domain music generation Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR'2017), Suzhou, China.Google ScholarGoogle Scholar
  33. Xiaofan Zhang, Feng Zhou, Yuanqing Lin, and Shaoting Zhang . 2016. Embedding label structures for fine-grained feature representation Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1114--1123.Google ScholarGoogle Scholar
  34. Yu Zhang and Qiang Yang . 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114 (2017).Google ScholarGoogle Scholar

Index Terms

  1. XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
            July 2018
            2925 pages
            ISBN:9781450355520
            DOI:10.1145/3219819

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 July 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader