research-article

Pop Music Generation: From Melody to Multi-style Arrangement

Authors:
Hongyuan Zhu

University of Science and Technology of China, Hefei, Anhui, China

University of Science and Technology of China, Hefei, Anhui, China
View Profile

,
Qi Liu

University of Science and Technology of China, Hefei, Anhui, China

University of Science and Technology of China, Hefei, Anhui, China
View Profile

,
Nicholas Jing Yuan

Huawei Cloud8AI, Hangzhou, Zhejiang, China

Huawei Cloud8AI, Hangzhou, Zhejiang, China
View Profile

,
Kun Zhang

University of Science and Technology of China, Hefei, Anhui, China

University of Science and Technology of China, Hefei, Anhui, China
View Profile

,
Guang Zhou

Microsoft, Suzhou, China

Microsoft, Suzhou, China
View Profile

,
Enhong Chen

University of Science and Technology of China, Hefei, Anhui, China

University of Science and Technology of China, Hefei, Anhui, China
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 14 Issue 5Article No.: 54pp 1–31https://doi.org/10.1145/3374915

Published:06 July 2020Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Music plays an important role in our daily life. With the development of deep learning and modern generation techniques, researchers have done plenty of works on automatic music generation. However, due to the special requirements of both melody and arrangement, most of these methods have limitations when applying to multi-track music generation. Some critical factors related to the quality of music are not well addressed, such as chord progression, rhythm pattern, and musical style. In order to tackle the problems and ensure the harmony of multi-track music, in this article, we propose an end-to-end melody and arrangement generation framework to generate a melody track with several accompany tracks played by some different instruments. To be specific, we first develop a novel Chord based Rhythm and Melody Cross-Generation Model to generate melody with a chord progression. Then, we propose a Multi-Instrument Co-Arrangement Model based on multi-task learning for multi-track music arrangement. Furthermore, to control the musical style of arrangement, we design a Multi-Style Multi-Instrument Co-Arrangement Model to learn the musical style with adversarial training. Therefore, we can not only maintain the harmony of the generated music but also control the musical style for better utilization. Extensive experiments on a real-world dataset demonstrate the superiority and effectiveness of our proposed models.

References

Howard Anton and Chris Rorres. 2013. Elementary Linear Algebra, Binder Ready Version: Applications Version. John Wiley 8 Sons.Google Scholar
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations (ICLR'15).Google Scholar
Pierre Baldi, Yves Chauvin, Tim Hunkapiller, and Marcella A McClure. 1994. Hidden Markov models of biological primary sequence information. Proceedings of the National Academy of Sciences 91, 3 (1994), 1059--1063.Google ScholarCross Ref
Judith O. Becker. 2004. Deep Listeners: Music, Emotion, and Trancing. Vol. 2. Indiana University Press.Google Scholar
Geoffray Bonnin and Dietmar Jannach. 2015. Automated generation of music playlists: Survey and experiments. ACM Computing Surveys 47, 2 (2015), 26.Google ScholarDigital Library
Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of the 19th International Conference on Computational Statistics. Springer, 177--186.Google ScholarCross Ref
Mason Bretan, Gil Weinberg, and Larry Heck. 2016. A unit selection methodology for music generation using deep neural networks. arXiv preprint arXiv:1612.03789 (2016).Google Scholar
Jean-Pierre Briot, Gaëtan Hadjeres, and François Pachet. 2017. Deep learning techniques for music generation-a survey. arXiv preprint arXiv:1709.01620 (2017).Google Scholar
Gino Brunner, Andres Konrad, Yuyi Wang, and Roger Wattenhofer. 2018. MIDI-VAE: Modeling dynamics and instrumentation of music with applications to style transfer. In 19th International Society for Music Information Retrieval Conference (ISMIR'18).Google Scholar
Pietro Casella and Ana Paiva. 2001. Magenta: An architecture for real time automatic composition of background music. In Proceedings of theInternational Workshop on Intelligent Virtual Agents. Springer, 224--232.Google ScholarCross Ref
Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder–Decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. 103–111.Google ScholarCross Ref
Parag Chordia, Avinash Sastry, and Sertan Şentürk. 2011. Predictive tabla modelling using variable-length Markov and hidden Markov models. Journal of New Music Research 40, 2 (2011), 105--118.Google ScholarCross Ref
Hang Chu, Raquel Urtasun, and Sanja Fidler. 2016. Song from pi: A musically plausible network for pop music generation. arXiv preprint arXiv:1611.03477 (2016).Google Scholar
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. ACM, 160--167.Google ScholarDigital Library
Darrell Conklin. 2003. Music generation from statistical models. In Proceedings of the AISB 2003 Symposium on Artificial Intelligence and Creativity in the Arts and Sciences. 30--35.Google Scholar
Shuqi Dai, Zheng Zhang, and Gus Xia. 2018. Music style transfer issues: A position paper. arXiv preprint arXiv:1803.06841 (2018).Google Scholar
Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 1723--1732.Google ScholarCross Ref
Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang. 2018. Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In Thirty-Second AAAI Conference on Artificial Intelligence.Google Scholar
Sean R. Eddy. 1996. Hidden Markov models. Current Opinion in Structural Biology 6, 3 (1996), 361--365.Google ScholarCross Ref
Franco Fabbri. 2007. Browsing music spaces: Categories and the musical mind. In Proceedings of the International Association for the Study of Popular Music.Google Scholar
Yanjie Fu, Hui Xiong, Yong Ge, Zijun Yao, Yu Zheng, and Zhi-Hua Zhou. 2014. Exploiting geographic dependencies for real estate appraisal: A mutual perspective of ranking and clustering. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1047--1056.Google ScholarDigital Library
Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 1440--1448.Google ScholarDigital Library
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems. 2672--2680.Google ScholarDigital Library
Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’13). IEEE, 6645--6649.Google ScholarCross Ref
Gaëtan Hadjeres, François Pachet, and Frank Nielsen. 2017. Deepbach: a steerable model for bach chorales generation. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70. JMLR. org, 1362–1371.Google Scholar
Christopher Harte, Mark Sandler, and Martin Gasser. 2006. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia. ACM, 21--26.Google ScholarDigital Library
Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. 2017. A joint many-task model: Growing a neural network for multiple NLP tasks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1923–1933.Google ScholarCross Ref
Nanzhu Jiang, Peter Grosche, Verena Konz, and Meinard Müller. 2011. Analyzing chroma feature types for automated chord recognition. In Proceedings of the 42nd Audio Engineering Society Conference. Audio Engineering Society.Google Scholar
Daniel Johnson. 2015. Composing music with recurrent neural networks.Google Scholar
Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 655–665.Google ScholarCross Ref
Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7482–7491.Google Scholar
Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).Google Scholar
Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2017. Unsupervised machine translation using monolingual corpora only. arXiv preprint arXiv:1711.00043 (2017).Google Scholar
Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Doklady Akademii Nauk SSSR 163, 4 (1966), 707–710.Google Scholar
Bei Liu, Jianlong Fu, Makoto P. Kato, and Masatoshi Yoshikawa. 2018. Beyond narrative description: Generating poetry from images by multi-adversarial training. In Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 783--791.Google ScholarDigital Library
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 2873–2879.Google Scholar
Qi Liu, Zhenya Huang, Yu Yin, Enhong Chen, Hui Xiong, Yu Su, Guoping Hu. 2019. EKT: Exercise-aware knowledge tracing for student performance prediction. IEEE Transactions on Knowledge and Data Engineering (2019).Google Scholar
Qi Liu, Guifeng Wang, Hongke Zhao, Chuanren Liu, Tong Xu, and Enhong Chen. 2017. Enhancing campaign design in crowdfunding: A product supply optimization perspective. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 695--702.Google ScholarCross Ref
Mingsheng Long and Jianmin Wang. 2015. Learning multiple tasks with deep relationship networks. arXiv preprint arXiv:1506.02117 (2015).Google ScholarDigital Library
Chien-Yu Lu, Min-Xin Xue, Chia-Che Chang, Che-Rung Lee, and Li Su. 2019. Play as you like: Timbre-enhanced multi-modal music style transfer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 1061--1068.Google ScholarDigital Library
Prasanta Chandra Mahalanobis. 1936. On the generalized distance in statistics. In Proceedings of the National Institute of Science of India.Google Scholar
Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3994--4003.Google ScholarCross Ref
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.Google Scholar
Olof Mogren. 2016. C-RNN-GAN: Continuous recurrent neural networks with adversarial training. arXiv preprint arXiv:1611.09904 (2016).Google Scholar
François Pachet, Sony CSL Paris, Alexandre Papadopoulos, and Pierre Roy. 2017. Sampling variations of sequences for structured music generation. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’17). 167--173.Google Scholar
François Pachet and Pierre Roy. 2011. Markov constraints: Steerable generation of Markov sequences. Constraints 16, 2 (2011), 148--172.Google ScholarDigital Library
Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, and Anders Søgaard. 2017. Sluice networks: Learning what to share between loosely related tasks. arXiv preprint arXiv:1705.08142 (2017).Google Scholar
Romain Sabathé, Eduardo Coutinho, and Björn Schuller. 2017. Deep recurrent music writer: Memory-enhanced variational autoencoder-based musical score composition and an objective measure. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’17). IEEE, 3467--3474.Google ScholarCross Ref
Paul Schmeling. 2011. Berklee Music Theory. Berklee Press.Google Scholar
Heung-Yeung Shum, Xiao-dong He, and Di Li. 2018. From Eliza to XiaoIce: challenges and opportunities with social chatbots. Frontiers of Information Technology and Electronic Engineering 19, 1 (2018), 10--26.Google ScholarCross Ref
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 3104--3112.Google Scholar
Keiichi Tokuda, Takayoshi Yoshimura, Takashi Masuko, Takao Kobayashi, and Tadashi Kitamura. 2000. Speech parameter generation algorithms for HMM-based speech synthesis. In Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 3. IEEE, 1315--1318.Google ScholarCross Ref
Andries Van Der Merwe and Walter Schulze. 2011. Music generation with Markov models. IEEE MultiMedia 18, 3 (2011), 78--85.Google ScholarDigital Library
Dominique T. Vuvan and Bryn Hughes. 2019. Musical style affects the strength of harmonic expectancy. Music 8 Science 2 (2019), 2059204318816066.Google Scholar
Yanan Wang, Qi Liu, Chuan Qin, Tong Xu, Yijun Wang, Enhong Chen, and Hui Xiong. 2018. Exploiting topic-based adversarial neural network for cross-domain keyphrase extraction. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM’18). IEEE, 597--606.Google ScholarCross Ref
Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang. 2017. MidiNet: A convolutional generative adversarial network for symbolic-domain music generation. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’17).Google Scholar
Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. Seqgan: Sequence generative adversarial nets with policy gradient. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google Scholar
Kun Zhang, Guangyi Lv, Le Wu, Enhong Chen, Qi Liu, Han Wu, and Fangzhao Wu. 2018. Image-enhanced multi-level sentence representation net for natural language inference. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM’18). IEEE, 747--756.Google ScholarCross Ref
Kai Zhang, Hefu Zhang, Qi Liu, Hongke Zhao, Hengshu Zhu, and Enhong Chen. 2019. Interactive attention transfer network for cross-domain sentiment classification. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
Xiaofan Zhang, Feng Zhou, Yuanqing Lin, and Shaoting Zhang. 2016. Embedding label structures for fine-grained feature representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1114--1123.Google ScholarCross Ref
Yu Zhang and Qiang Yang. 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114 (2017).Google Scholar
Hengshu Zhu, Enhong Chen, Kuifei Yu, Huanhuan Cao, Hui Xiong, and Jilei Tian. 2012. Mining personal context-aware preferences for mobile users. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining. IEEE, 1212--1217.Google ScholarDigital Library
Hongyuan Zhu, Qi Liu, Nicholas Jing Yuan, Chuan Qin, Jiawei Li, Kun Zhang, Guang Zhou, Furu Wei, Yuanchun Xu, and Enhong Chen. 2018. Xiaoice band: A melody and arrangement generation framework for pop music. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery 8 Data Mining. ACM, 2837--2846.Google ScholarDigital Library

Index Terms

Pop Music Generation: From Melody to Multi-style Arrangement
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
      2. Reinforcement learning
        Sequential decision making
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

With the development of knowledge of music composition and the recent increase in demand, an increasing number of companies and research institutes have begun to study the automatic generation of music. However, previous models have limitations when ...
Read More
Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

A great number of deep learning based models have been recently proposed for automatic music composition. Among these models, the Transformer stands out as a prominent approach for generating expressive classical piano performance with a coherent ...
Read More
Structure-Enhanced Pop Music Generation via Harmony-Aware Learning
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pop music generation has always been an attractive topic for both musicians and scientists for a long time. However, automatically composing pop music with a satisfactory structure is still a challenging issue. In this paper, we propose to leverage ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 14, Issue 5
Special Issue on KDD 2018, Regular Papers and Survey Paper
October 2020
376 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3407672
Editors:
Charu Aggarwal
IBM T. J. Watson Research, USA
,
Xindong Wu
Minginglamp Academy of Sciences, China
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 July 2020
- Online AM: 7 May 2020
- Accepted: 1 December 2019
- Revised: 1 October 2019
- Received: 1 April 2019
Published in tkdd Volume 14, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Harmony evaluation
Music generation
melody and arrangement generation
multi-task joint learning
musical style
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 900
  Total Downloads
- Downloads (Last 12 months)162
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Pop Music Generation: From Melody to Multi-style Arrangement

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music

Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions

Structure-Enhanced Pop Music Generation via Harmony-Aware Learning