跳到主要內容

簡易檢索 / 詳目顯示

研究生: 黃千豪
Chien-Hao Huang
論文名稱: 結合深度監督式學習與強化式學習的音樂旋律生成
Combining Deep Supervised Learning and Reinforcement Learning for Music Melody Generation
指導教授: 施國琛
Kuo-Chen Shih
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 50
中文關鍵詞: 人工智慧深度學習音樂生成
外文關鍵詞: Artificial Intelligence, Deep Learning, Music Generation
相關次數: 點閱:11下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本篇研究中,我們結合了深度監督式學習與強化式學習進行符號化音樂的生成。使用深度學習對符號化音樂進行建模的相關任務時,音樂的片段可以被當作沿著時間的符號序列來處理,因此通常會使用具有時間訊息建模能力的模型,如同文本建模或自然語言處理等其他序列建模任務。在這種監督式的方法中,深度神經網路可以自動地從現存的資料庫中抓取音樂性的特徵。然而,音樂的譜曲通常包含一些定義完整的結構和慣用的樂理規則,對聽眾來說較為悅耳。這些約束可以使用強化式學習強加到神經網路中,而單純使用監督式學習的技術較難達成。透過結合這兩種深度學習的主要訓練架構,我們可以讓模型模仿現存資料庫的風格,並且控制生成旋律的特定表現。我們還研究了輸入表式與架構的設計讓模型更容易的抓取音樂的結構特徵。在實驗中,我們主要聚焦在中國江南音樂的單音旋律生成,並驗證生成結果的品質與特性,以及架構中不同模組的有效性。


    In this work, we present a symbolic music melody generation method that combines supervised learning and reinforcement learning. For using deep learning in symbolic music modeling tasks, music clips can be processed as sequences of symbols along time, so sequence models with the temporal information modeling ability usually be used, just like other sequential modeling tasks, such as text modeling or natural language processing. In these kind of supervised approaches, deep neural network is able to capture the musical features from the existing dataset automatically. However, music compositions by human composers usually have some well-defined structures and conventional rules of music theory that please the audience. These constraints can be enforced into neural network using reinforcement learning which cannot achieve using supervised learning techniques only. By combining these two major training architectures in deep learning, we can make the model mimic the style of the existing dataset and also control specific behaviors of the generated melody. We also investigate the design of input representation and architecture to make the model capture the music structure feature easier. In the experiments, we focus on monophonic melody generation of Chinese Jiangnan style music, and validate the quality and some characteristics of the generated result, as well as the effectiveness of different modules in the architecture.

    Chinese Abstract ......................................................................................................................... i English Abstract ........................................................................................................................ ii Table of Contents ..................................................................................................................... iii I Introduction ....................................................................................................................... 1 II Related Works ................................................................................................................... 3 III Background ....................................................................................................................... 6 3-1 LSTM Sequence Generation ..................................................................................... 6 3-2 PPO (Proximal Policy Optimization) Algorithms ................................................... 10 IV Method ............................................................................................................................. 15 4-1 Architecture Overview ............................................................................................ 15 4-2 Hierarchical Recurrent Neural Network (Bar Profile) ............................................. 17 4-3 Input Representation and Additional Rhythmic Information ................................... 19 4-4 PPO with LSTM Architecture ................................................................................. 22 4-5 Positional Encoding ................................................................................................. 23 4-6 Reward Functions Design ........................................................................................ 26 4-6-1 Pitch Model ............................................................................................. 26 4-6-2 Duration Model ....................................................................................... 28 4-6-3 Bar Profile Model .................................................................................... 29 4-7 Implementation Details ........................................................................................... 29 V Experiments ..................................................................................................................... 31 5-1 Dataset ..................................................................................................................... 31 5-2 Training ................................................................................................................... 31 5-3 Evaluation ................................................................................................................ 32 5-3-1 Characteristics ......................................................................................... 32 5-3-2 Positional Encoding ................................................................................. 34 5-3-3 Additional Rhythmic Information ........................................................... 36 5-3-4 RL Tuner Method ..................................................................................... 38 5-4 Result Examples ...................................................................................................... 39 VI Discussions and Conclusions ........................................................................................... 41 References ............................................................................................................................... 42

    [1] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov. “Proximal Policy Optimization Algorithms”. arXiv:1707.06347, 2017.
    [2] Natasha Jaques, Shixiang Gu, Richard E. Turner, Douglas Eck. “Tuning Recurrent Neural Networks with Reinforcement Learning”. arXiv:1611.02796v2, 2016.
    [3] Hado van Hasselt, Arthur Guez, David Silver. “Deep Reinforcement Learning with Double Q-learning”. arXiv:1509.06461, 2015.
    [4] Sepp Hochreiter, Jürgen Schmidhuber. “Long Short-Term Memory”. Neural Computation, 1997.
    [5] Peter M. Todd. “A Connectionist Approach to Algorithmic Composition”. Computer Music Journal (CMJ), Vol. 13, No. 4, pp. 27-43, 1989.
    [6] Michael C. Mozer. “Neural Network Music Composition by Prediction: Exploring the Benefits of Psychophysical Constraints and Multi-scale Processing”. Connection Science, Vol. 6 (2-3):247-280, 1994.
    [7] Alex Graves. “Generating Sequences With Recurrent Neural Networks”. arXiv:1308.0850v5, 2014.
    [8] Douglas Eck, Jürgen Schmidhuber. “A First Look at Music Composition using LSTM Recurrent Neural Networks”. IDSIA/USI-SUPSI, Technical Report No. IDSIA-07-02, Switzerland, 2002.
    [9] Gaëtan Hadjeres, François Pachet, Frank Nielson. “DeepBach: a Steerable Model for Bach Chorales Generation”. arXiv:1612.01010v2, 2016.
    [10] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. “Generative Adversarial Networks”. arXiv:1406.2661, 2014.’
    [11] Olof Mogren. “C-RNN-GAN: Continuous recurrent networks with adversarial training”. arXiv:1611.09904, 2016.
    [12] Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, Yi-Hsuan Yang. “MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment”. arXiv:1709.06298v2, 2017.
    [13] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin. “Attention Is All You Need”. arXiv:1706.03762v5, 2017.
    [14] Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck. “Music Transformer: Generating Music with Long-Term Structure”. arXiv:1809.04281v3, 2018.
    [15] Nan Jiang, Sheng Jin, Zhiyao Duan, Changshui Zhang. “RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning”. arXiv:2002.03082, 2020.
    [16] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wiestra, Martin Riedmiller. “Playing Atari with Deep Reinforcement Learning”. arXiv:1312.5602, 2013.
    [17] Long short-term memory, Wikipedia. https://en.wikipedia.org/wiki/Long_short-term_memory#/media/File:LSTM_Cell.svg
    [18] Xuefei Huang, Seung Ho Hong, Mengmeng Yu, Yuemin Ding, Junhui Jiang. “Demand Response Management for Industrial Facilities: A Deep Reinforcement Learning Approach”. IEEE Access, vol. 7, pp. 82194-82205, 2019.
    [19] Daphné Lafleur, Sarath Chandar, Gilles Pesant. “Combining Reinforcement Learning and Constraint Programming for Sequence-Generation Tasks with Hard Constraints”. 28th International Conference on Principles and Practice of Constraint Programming (CP 2022), 2022.
    [20] Harish Kumar, Balaraman Ravindran. “Polyphonic Music Composition with LSTM Neural Networks and Reinforcement Learning”. arXiv:1902.01973v2, 2019.
    [21] Training an RNN without Supervision, Machine Learning for Scientists. https://ml-lectures.org/docs/unsupervised_learning/ml_unsupervised-2.html
    [22] Zheng Sun, Jiaqi Liu, Zewang Zhang, Jingwen Chen, Zhao Huo, Ching Hua Lee, Xiao Zhang. “Composing Music with Grammar Argumented Neural Networks and Note-Level Encoding”. arXiv:1611.05416v2, 2016.
    [23] Pedro Borges. Deep Learning: Recurrent Neural Networks. https://medium.com/deeplearningbrasilia/deep-learning-recurrent-neural-networks-f9482a24d010
    [24] Jian Wu, Changran Hu, Yulong Wang, Xiaolin Hu, Jun Zhu. “A Hierarchical Recurrent Neural Network for Symbolic Melody Generation”. arXiv:1712.05274v2, 2017.
    [25] Recurrent PPO, Stable Baselines3 - Contrib https://sb3-contrib.readthedocs.io/en/master/modules/ppo_recurrent.html
    [26] Sho Takase, Naoaki Okazaki. “Positional Encoding to Control Output Sequence Length”. arXiv:1904.07418, 2019.
    [27] PyTorch. https://pytorch.org/
    [28] LSTM. PyTorch. https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html
    [29] Shulei Ji, Xinyu Yang, Jing Luo, and Juan Li. “RL-Chord: CLSTM-Based Melody Harmonization Using Deep Reinforcement Learning”. IEEE Transactions on Neural Networks and Learning Systems (Early Access), 2023.
    [30] Diederik P. Kingma, Jimmy Lei Ba. “Adam: A Method for Stochastic Optimization”. arXiv:1412.6980v9, 2014.
    [31] Sageev Oore, Ian Simon, Sander Dieleman, Douglas Eck, Karen Simonyan. “This Time with Feeling: Learning Expressive Musical Performance”. arXiv:1808.03715, 2018.
    [32] Proximal Policy Optimization. OpenAI. https://openai.com/research/openai-baselines-ppo
    [33] John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan, Pieter Abbeel. “Trust Region Policy Optimization”. arXiv:1502.05477v5, 2015.
    [34] Magenta, Google. https://magenta.tensorflow.org/
    [35] Jean-Pierre, Gaëtan Hadjeres, François-David Pachet. “Deep Learning Techniques for Music Generation – A Survey”. arXiv:1709.01620v4, 2017.
    [36] Nonchord tone, Wikipedia. https://en.wikipedia.org/wiki/Nonchord_tone
    [37] 施國琛, 張儷瓊, 黃志方, 孫沛立. “Erhu Performance and Music Style Analysis Using Artificial Intelligence” (“以人工智慧實踐二胡演奏行為暨音樂風格分析”). 國家科學及技術委員會 NSTC 112-2420-H-008-002.
    [38] “中國民間歌曲集成”. 中國民間歌曲集成全國編輯委員會, 1988.
    [39] 蒲亨強. “江蘇地域音樂文化”. 2014.
    [40] 劉健. “葫蘆絲作品風格研究”. 2013.
    [41] 周美妤. “朱昌耀《揚州小調》、《江南春色》作品分析與詮釋”. 2016.

    QR CODE
    :::