skip to main content
research-article

Pop Music Generation: From Melody to Multi-style Arrangement

Authors Info & Claims
Published:06 July 2020Publication History
Skip Abstract Section

Abstract

Music plays an important role in our daily life. With the development of deep learning and modern generation techniques, researchers have done plenty of works on automatic music generation. However, due to the special requirements of both melody and arrangement, most of these methods have limitations when applying to multi-track music generation. Some critical factors related to the quality of music are not well addressed, such as chord progression, rhythm pattern, and musical style. In order to tackle the problems and ensure the harmony of multi-track music, in this article, we propose an end-to-end melody and arrangement generation framework to generate a melody track with several accompany tracks played by some different instruments. To be specific, we first develop a novel Chord based Rhythm and Melody Cross-Generation Model to generate melody with a chord progression. Then, we propose a Multi-Instrument Co-Arrangement Model based on multi-task learning for multi-track music arrangement. Furthermore, to control the musical style of arrangement, we design a Multi-Style Multi-Instrument Co-Arrangement Model to learn the musical style with adversarial training. Therefore, we can not only maintain the harmony of the generated music but also control the musical style for better utilization. Extensive experiments on a real-world dataset demonstrate the superiority and effectiveness of our proposed models.

References

  1. Howard Anton and Chris Rorres. 2013. Elementary Linear Algebra, Binder Ready Version: Applications Version. John Wiley 8 Sons.Google ScholarGoogle Scholar
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations (ICLR'15).Google ScholarGoogle Scholar
  3. Pierre Baldi, Yves Chauvin, Tim Hunkapiller, and Marcella A McClure. 1994. Hidden Markov models of biological primary sequence information. Proceedings of the National Academy of Sciences 91, 3 (1994), 1059--1063.Google ScholarGoogle ScholarCross RefCross Ref
  4. Judith O. Becker. 2004. Deep Listeners: Music, Emotion, and Trancing. Vol. 2. Indiana University Press.Google ScholarGoogle Scholar
  5. Geoffray Bonnin and Dietmar Jannach. 2015. Automated generation of music playlists: Survey and experiments. ACM Computing Surveys 47, 2 (2015), 26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of the 19th International Conference on Computational Statistics. Springer, 177--186.Google ScholarGoogle ScholarCross RefCross Ref
  7. Mason Bretan, Gil Weinberg, and Larry Heck. 2016. A unit selection methodology for music generation using deep neural networks. arXiv preprint arXiv:1612.03789 (2016).Google ScholarGoogle Scholar
  8. Jean-Pierre Briot, Gaëtan Hadjeres, and François Pachet. 2017. Deep learning techniques for music generation-a survey. arXiv preprint arXiv:1709.01620 (2017).Google ScholarGoogle Scholar
  9. Gino Brunner, Andres Konrad, Yuyi Wang, and Roger Wattenhofer. 2018. MIDI-VAE: Modeling dynamics and instrumentation of music with applications to style transfer. In 19th International Society for Music Information Retrieval Conference (ISMIR'18).Google ScholarGoogle Scholar
  10. Pietro Casella and Ana Paiva. 2001. Magenta: An architecture for real time automatic composition of background music. In Proceedings of theInternational Workshop on Intelligent Virtual Agents. Springer, 224--232.Google ScholarGoogle ScholarCross RefCross Ref
  11. Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder–Decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. 103–111.Google ScholarGoogle ScholarCross RefCross Ref
  12. Parag Chordia, Avinash Sastry, and Sertan Şentürk. 2011. Predictive tabla modelling using variable-length Markov and hidden Markov models. Journal of New Music Research 40, 2 (2011), 105--118.Google ScholarGoogle ScholarCross RefCross Ref
  13. Hang Chu, Raquel Urtasun, and Sanja Fidler. 2016. Song from pi: A musically plausible network for pop music generation. arXiv preprint arXiv:1611.03477 (2016).Google ScholarGoogle Scholar
  14. Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. ACM, 160--167.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Darrell Conklin. 2003. Music generation from statistical models. In Proceedings of the AISB 2003 Symposium on Artificial Intelligence and Creativity in the Arts and Sciences. 30--35.Google ScholarGoogle Scholar
  16. Shuqi Dai, Zheng Zhang, and Gus Xia. 2018. Music style transfer issues: A position paper. arXiv preprint arXiv:1803.06841 (2018).Google ScholarGoogle Scholar
  17. Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 1723--1732.Google ScholarGoogle ScholarCross RefCross Ref
  18. Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang. 2018. Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  19. Sean R. Eddy. 1996. Hidden Markov models. Current Opinion in Structural Biology 6, 3 (1996), 361--365.Google ScholarGoogle ScholarCross RefCross Ref
  20. Franco Fabbri. 2007. Browsing music spaces: Categories and the musical mind. In Proceedings of the International Association for the Study of Popular Music.Google ScholarGoogle Scholar
  21. Yanjie Fu, Hui Xiong, Yong Ge, Zijun Yao, Yu Zheng, and Zhi-Hua Zhou. 2014. Exploiting geographic dependencies for real estate appraisal: A mutual perspective of ranking and clustering. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1047--1056.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 1440--1448.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems. 2672--2680.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’13). IEEE, 6645--6649.Google ScholarGoogle ScholarCross RefCross Ref
  25. Gaëtan Hadjeres, François Pachet, and Frank Nielsen. 2017. Deepbach: a steerable model for bach chorales generation. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70. JMLR. org, 1362–1371.Google ScholarGoogle Scholar
  26. Christopher Harte, Mark Sandler, and Martin Gasser. 2006. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia. ACM, 21--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. 2017. A joint many-task model: Growing a neural network for multiple NLP tasks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1923–1933.Google ScholarGoogle ScholarCross RefCross Ref
  28. Nanzhu Jiang, Peter Grosche, Verena Konz, and Meinard Müller. 2011. Analyzing chroma feature types for automated chord recognition. In Proceedings of the 42nd Audio Engineering Society Conference. Audio Engineering Society.Google ScholarGoogle Scholar
  29. Daniel Johnson. 2015. Composing music with recurrent neural networks.Google ScholarGoogle Scholar
  30. Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 655–665.Google ScholarGoogle ScholarCross RefCross Ref
  31. Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7482–7491.Google ScholarGoogle Scholar
  32. Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).Google ScholarGoogle Scholar
  33. Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2017. Unsupervised machine translation using monolingual corpora only. arXiv preprint arXiv:1711.00043 (2017).Google ScholarGoogle Scholar
  34. Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Doklady Akademii Nauk SSSR 163, 4 (1966), 707–710.Google ScholarGoogle Scholar
  35. Bei Liu, Jianlong Fu, Makoto P. Kato, and Masatoshi Yoshikawa. 2018. Beyond narrative description: Generating poetry from images by multi-adversarial training. In Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 783--791.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 2873–2879.Google ScholarGoogle Scholar
  37. Qi Liu, Zhenya Huang, Yu Yin, Enhong Chen, Hui Xiong, Yu Su, Guoping Hu. 2019. EKT: Exercise-aware knowledge tracing for student performance prediction. IEEE Transactions on Knowledge and Data Engineering (2019).Google ScholarGoogle Scholar
  38. Qi Liu, Guifeng Wang, Hongke Zhao, Chuanren Liu, Tong Xu, and Enhong Chen. 2017. Enhancing campaign design in crowdfunding: A product supply optimization perspective. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 695--702.Google ScholarGoogle ScholarCross RefCross Ref
  39. Mingsheng Long and Jianmin Wang. 2015. Learning multiple tasks with deep relationship networks. arXiv preprint arXiv:1506.02117 (2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Chien-Yu Lu, Min-Xin Xue, Chia-Che Chang, Che-Rung Lee, and Li Su. 2019. Play as you like: Timbre-enhanced multi-modal music style transfer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 1061--1068.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Prasanta Chandra Mahalanobis. 1936. On the generalized distance in statistics. In Proceedings of the National Institute of Science of India.Google ScholarGoogle Scholar
  42. Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3994--4003.Google ScholarGoogle ScholarCross RefCross Ref
  43. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.Google ScholarGoogle Scholar
  44. Olof Mogren. 2016. C-RNN-GAN: Continuous recurrent neural networks with adversarial training. arXiv preprint arXiv:1611.09904 (2016).Google ScholarGoogle Scholar
  45. François Pachet, Sony CSL Paris, Alexandre Papadopoulos, and Pierre Roy. 2017. Sampling variations of sequences for structured music generation. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’17). 167--173.Google ScholarGoogle Scholar
  46. François Pachet and Pierre Roy. 2011. Markov constraints: Steerable generation of Markov sequences. Constraints 16, 2 (2011), 148--172.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, and Anders Søgaard. 2017. Sluice networks: Learning what to share between loosely related tasks. arXiv preprint arXiv:1705.08142 (2017).Google ScholarGoogle Scholar
  48. Romain Sabathé, Eduardo Coutinho, and Björn Schuller. 2017. Deep recurrent music writer: Memory-enhanced variational autoencoder-based musical score composition and an objective measure. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’17). IEEE, 3467--3474.Google ScholarGoogle ScholarCross RefCross Ref
  49. Paul Schmeling. 2011. Berklee Music Theory. Berklee Press.Google ScholarGoogle Scholar
  50. Heung-Yeung Shum, Xiao-dong He, and Di Li. 2018. From Eliza to XiaoIce: challenges and opportunities with social chatbots. Frontiers of Information Technology and Electronic Engineering 19, 1 (2018), 10--26.Google ScholarGoogle ScholarCross RefCross Ref
  51. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 3104--3112.Google ScholarGoogle Scholar
  52. Keiichi Tokuda, Takayoshi Yoshimura, Takashi Masuko, Takao Kobayashi, and Tadashi Kitamura. 2000. Speech parameter generation algorithms for HMM-based speech synthesis. In Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 3. IEEE, 1315--1318.Google ScholarGoogle ScholarCross RefCross Ref
  53. Andries Van Der Merwe and Walter Schulze. 2011. Music generation with Markov models. IEEE MultiMedia 18, 3 (2011), 78--85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Dominique T. Vuvan and Bryn Hughes. 2019. Musical style affects the strength of harmonic expectancy. Music 8 Science 2 (2019), 2059204318816066.Google ScholarGoogle Scholar
  55. Yanan Wang, Qi Liu, Chuan Qin, Tong Xu, Yijun Wang, Enhong Chen, and Hui Xiong. 2018. Exploiting topic-based adversarial neural network for cross-domain keyphrase extraction. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM’18). IEEE, 597--606.Google ScholarGoogle ScholarCross RefCross Ref
  56. Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang. 2017. MidiNet: A convolutional generative adversarial network for symbolic-domain music generation. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’17).Google ScholarGoogle Scholar
  57. Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. Seqgan: Sequence generative adversarial nets with policy gradient. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  58. Kun Zhang, Guangyi Lv, Le Wu, Enhong Chen, Qi Liu, Han Wu, and Fangzhao Wu. 2018. Image-enhanced multi-level sentence representation net for natural language inference. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM’18). IEEE, 747--756.Google ScholarGoogle ScholarCross RefCross Ref
  59. Kai Zhang, Hefu Zhang, Qi Liu, Hongke Zhao, Hengshu Zhu, and Enhong Chen. 2019. Interactive attention transfer network for cross-domain sentiment classification. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  60. Xiaofan Zhang, Feng Zhou, Yuanqing Lin, and Shaoting Zhang. 2016. Embedding label structures for fine-grained feature representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1114--1123.Google ScholarGoogle ScholarCross RefCross Ref
  61. Yu Zhang and Qiang Yang. 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114 (2017).Google ScholarGoogle Scholar
  62. Hengshu Zhu, Enhong Chen, Kuifei Yu, Huanhuan Cao, Hui Xiong, and Jilei Tian. 2012. Mining personal context-aware preferences for mobile users. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining. IEEE, 1212--1217.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Hongyuan Zhu, Qi Liu, Nicholas Jing Yuan, Chuan Qin, Jiawei Li, Kun Zhang, Guang Zhou, Furu Wei, Yuanchun Xu, and Enhong Chen. 2018. Xiaoice band: A melody and arrangement generation framework for pop music. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery 8 Data Mining. ACM, 2837--2846.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Pop Music Generation: From Melody to Multi-style Arrangement

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Knowledge Discovery from Data
            ACM Transactions on Knowledge Discovery from Data  Volume 14, Issue 5
            Special Issue on KDD 2018, Regular Papers and Survey Paper
            October 2020
            376 pages
            ISSN:1556-4681
            EISSN:1556-472X
            DOI:10.1145/3407672
            Issue’s Table of Contents

            Copyright © 2020 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 6 July 2020
            • Online AM: 7 May 2020
            • Accepted: 1 December 2019
            • Revised: 1 October 2019
            • Received: 1 April 2019
            Published in tkdd Volume 14, Issue 5

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format