跳到主要內容

簡易檢索 / 詳目顯示

研究生: 林妤潔
Yu-Jie Lin
論文名稱: 小提琴演奏追蹤系統:應用音源分離結果實現即時音樂追蹤與伴奏
A Violin Performance Tracking System: Utilizing Music Source Separation Results for Real-Time Music Tracking and Accompaniment
指導教授: 蘇木春
Mu-Chun Su
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 87
中文關鍵詞: 音樂資訊檢索音源分離音樂追蹤自動伴奏深度學習
外文關鍵詞: Music Informaation Retrieval, Music Source Separation, Music Tracking, Automatic Accompaniment, Deep Learning
相關次數: 點閱:14下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 小提琴一直以來都是許多人學習與演奏的樂器,有許多膾炙人口的
    曲子與優秀的小提琴音樂家。在眾多曲子中,小提琴與其他樂器的合奏
    曲子佔多數,因此需要其他樂器的演奏者一起合奏才能完整呈現曲子的
    風貌。然而,由於時間或成本的因素,尋找長期合作的合奏者 (伴奏者)
    並不是那麼容易,網路上的公開資源又多為混合音訊,合奏的效果不佳。
    因此本研究針對最常見的小提琴與鋼琴的合奏方式來開發一套系統,此
    系統可將混合音源中的小提琴與鋼琴音源分離,並使用分離音源追蹤現
    場小提琴演奏,輸出鋼琴伴奏。
    本研究旨在開發一套使用音源分離結果實現小提琴演奏追蹤的即時
    音樂追蹤系統,我們設計了音源分離模組與音樂追蹤模組,在音源分離
    模組方面,我們自行蒐集並建立一套新的公開整合資料集,用於訓練
    Band-Split RNN 模型,並改進了模型的頻帶切割方法。在模型的評估上,我們使用訊號失真比來計算模型的分離效果,結果顯示模型在資料缺乏
    與資料充足的情況下皆優於現有的基線模型,並證明頻帶切割方法的有
    效性。在音樂追蹤模組方面,我們改進了線上動態時間規整演算法與貪
    心向後對齊方法,重現了即時音樂追蹤模組的設計,並改良部分元件。
    在實際的測試中,即時音樂追蹤系統展現了低延遲與精準追蹤的表現,
    並在不同特徵的追蹤表現上保持了與離線追蹤相同穩定的追蹤效果。


    The violin has long been a popular instrument for learning and performance, with many well-known pieces and distinguished violinists. Among these pieces, ensemble compositions involving the violin and other instruments are
    predominant, requiring collaboration with other instrumentalists to fully present the musical piece. However, due to time or cost constraints, finding long-term
    ensemble partners (accompanists) can be challenging due to time or cost constraints. Online public resources often provide mixed audio, which does not yield good ensemble effects. Therefore, this research focuses on developing a
    system for the common violin and piano ensemble. This system can separate the violin and piano sources from a mixed audio source, track the violin’s performance using the separated audio, and output the piano accompaniment.
    The goal of this research is to develop a real-time music tracking system that utilizes source separation results to track violin performances. We designed a source separation module and a music tracking module. For the source separation module, we collected and established a new open integrated dataset to train the Band-Split RNN model, improving the model’s band-split method. We evaluated the model using the Signal-to-Distortion Ratio to measure the separation performance. The results show that the model outperforms existing baseline models in both data-limit and data-rich cases, demonstrating the effectiveness of the band-split method. For the music tracking module, we improved the Online Dynamic Time Warping algorithm and the Greedy Backward Alignment method, reimplementing the design of the real-time music tracking module and enhancing some blocks. In practical tests, the real-time music tracking system exhibited low latency and accurate tracking performance, maintaining stable tracking results comparable to offline tracking across different feature tracking
    performances.

    摘要 iii Abstract v 誌謝 vii 目錄 viii 一、 緒論 1 1.1 研究動機 .................................................................. 1 1.2 研究目的 .................................................................. 4 1.3 論文架構 .................................................................. 5 二、 背景知識以及文獻回顧 6 2.1 背景知識 .................................................................. 6 2.1.1 小提琴與鋼琴的演奏特性 .................................... 6 2.1.2 小提琴與鋼琴的音色分析 .................................... 7 2.2 文獻回顧 .................................................................. 10 2.2.1 音源分離相關研究 ............................................. 10 2.2.2 音樂追蹤相關研究 ............................................. 12 三、 研究方法 14 3.1 系統架構 .................................................................. 14 3.2 音源分離模組 ............................................................ 16 3.2.1 Band-Split RNN ................................................. 16 3.2.2 頻帶切割估計方法 ............................................. 18 3.3 音樂追蹤模組 ............................................................ 20 3.3.1 動態時間規整 (Dynamic Time Warping, DTW) ........... 20 3.3.2 線上動態時間規整 (Online Dynamic Time Warping, ODTW) .................................................................... 23 3.3.3 貪心向後對齊 (Greedy Backward Alignment, GBA) ..... 29 3.3.4 資料管理者元件 (Data Manager Block)..................... 30 3.3.5 音樂偵測器元件 (Music Detector Block) ................... 33 3.3.6 粗略位置估計器元件(Rough Position Estimator Block, RPE)........................................................................ 34 3.3.7 決策決定者元件 (Decision Maker Block) .................. 37 四、 實驗設計與結果 40 4.1 音源分離評估 ............................................................ 40 4.1.1 音源分離資料集 ................................................ 40 4.1.2 模型訓練細節 ................................................... 42 4.1.3 音源分離評估指標 ............................................. 43 4.1.4 音源分離結果比較 ............................................. 44 4.1.5 頻帶切割對於分離結果的影響 .............................. 47 4.2 音樂追蹤評估 ............................................................ 49 4.2.1 音樂追蹤模組評估方法 ....................................... 49 4.2.2 音樂追蹤模組評估結果 ....................................... 51 4.3 整體系統評估 ............................................................ 57 4.3.1 系統評估方法 ................................................... 57 4.3.2 系統評估結果 ................................................... 58 五、 總結 65 5.1 結論 ........................................................................ 65 5.2 未來展望 .................................................................. 66 參考文獻 67

    [1] C.A.P.E. “Statistics on the number of applicants for music subjects over the years.”
    (2024), [Online]. Available: https://www.cape.edu.tw/statistics/ (visited on 05/07/2024).
    [2] 我是江老師. “鋼琴伴奏月薪?一小時賺多少?到底都在做什麼?.” (2020), [Online].
    Available: https://youtu.be/8MBaTBXLzEw?t=100 (visited on 05/07/2024).
    [3] A. Défossez, N. Usunier, L. Bottou, and F. Bach, Music source separation in the waveform domain, 2021.
    [4] E. Cano, D. FitzGerald, A. Liutkus, M. D. Plumbley, and F.-R. Stöter, “Musical source
    separation: An introduction,” IEEE Signal Processing Magazine, vol. 36, no. 1, pp. 31–
    40, 2019.
    [5] Z. Rafii, A. Liutkus, F.-R. Stöter, S. I. Mimilakis, D. FitzGerald, and B. Pardo, “An
    overview of lead and accompaniment separation in music,” IEEE/ACM Transactions on
    Audio, Speech, and Language Processing, vol. 26, no. 8, pp. 1307–1335, 2018.
    [6] M. E. P. Davies, “Towards automatic rhythmic accompaniment,” Ph.D. dissertation,
    Citeseer, 2007.
    [7] Y. Li, “Application of computer-based auto accompaniment in music education,” International Journal of Emerging Technologies in Learning (iJET), vol. 15, no. 6, pp. 140–
    151, 2020.
    [8] X. Zhang and C. Liu, “Design of piano automatic accompaniment system based on artificial intelligence algorithm,” in International Conference on Computational Finance
    and Business Analytics, Springer, 2023, pp. 249–258.
    [9] N. Orio, S. Lemouton, and D. Schwarz, “Score following: State of the art and new developments,” New Interfaces for Musical Expression (NIME), 2003.
    [10] M. Dorfer, A. Arzt, and G. Widmer, “Towards score following in sheet music images,”
    arXiv preprint arXiv:1612.05050, 2016.
    [11] S. Ji, J. Luo, and X. Yang, “A comprehensive survey on deep music generation: Multilevel representations, algorithms, evaluations, and future directions,” arXiv preprint arXiv:2011.06801,
    2020.
    67
    [12] C. Hernandez-Olivan and J. R. Beltran, “Music composition with deep learning: A review,” Advances in speech and music technology: computational aspects and applications, pp. 25–50, 2022.
    [13] A. Solanki and S. Pandey, “Music instrument recognition using deep convolutional neural networks,” International Journal of Information Technology, vol. 14, no. 3, pp. 1659–
    1668, 2022.
    [14] K. Racharla, V. Kumar, C. B. Jayant, A. Khairkar, and P. Harish, “Predominant musical
    instrument classification based on spectral features,” in 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), IEEE, 2020, pp. 617–622.
    [15] E. Manilow, G. Wichern, and J. Le Roux, “Hierarchical musical instrument separation.,”
    in ISMIR, 2020, pp. 376–383.
    [16] P. Mangla, “Spotify music recommendation systems,” in PyImageSearch, P. Chugh,
    A. R. Gosthipaty, S. Huot, K. Kidriavsteva, and R. Raha, Eds., 2023.
    [17] Ableton. “Ableton live 11 lite.” (2024), [Online]. Available: https://www.ableton.com/
    en/live/ (visited on 06/04/2024).
    [18] Apple. “Logic pro.” (2024), [Online]. Available: https://www.apple.com/tw/logic-pro/
    (visited on 06/04/2024).
    [19] PreSonus. “Studio one.” (2024), [Online]. Available: https://www.presonus.com/en/
    studio-one.html (visited on 06/04/2024).
    [20] Ronimusic. “Amazing slow downer.” (2024), [Online]. Available: https://www.ronimusic.
    com/ (visited on 06/04/2024).
    [21] FORSCORE. “Forscore turbocharge tour sheet music.” (2024), [Online]. Available: https:
    //forscore.co/ (visited on 06/04/2024).
    [22] ISMIR. “International society for music information retrieval.” (2024), [Online]. Available: https://ismir.net/ (visited on 05/06/2024).
    [23] X. Zhao, Q. Tuo, R. Guo, and T. Kong, “Research on music signal processing based on
    a blind source separation algorithm,” Annals of Emerging Technologies in Computing
    (AETiC), vol. 6, no. 4, 2022.
    [24] Y. Mitsufuji, G. Fabbro, S. Uhlich, and F.-R. Stöter, Music Demixing Challenge 2021,
    2021.
    [25] G. Fabbro, S. Uhlich, C.-H. Lai, et al., “The Sound Demixing Challenge 2023 Music
    Demixing Track,” arXiv e-prints, arXiv:2308.06979, arXiv:2308.06979, Aug. 2023.
    [26] Z. Wang, K. Zhang, Y. Wang, et al., “Songdriver: Real-time music accompaniment generation without logical latency nor exposure bias,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 1057–1067.
    [27] F. Ding and Y. Cui, “Museflow: Music accompaniment generation based on flow,” Applied Intelligence, vol. 53, no. 20, pp. 23 029–23 038, 2023.
    68
    [28] C. Brazier and G. Widmer, “Improving real-time score following in opera by combining
    music with lyrics tracking,” arXiv preprint arXiv:2110.02592, 2021.
    [29] Antescofo. “Metronautapp.” (2024), [Online]. Available: https://metronautapp.com/zhTW (visited on 06/04/2024).
    [30] P. Comon, “Independent component analysis, a new concept?” Signal processing, vol. 36,
    no. 3, pp. 287–314, 1994.
    [31] D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” Advances
    in neural information processing systems, vol. 13, 2000.
    [32] A. Maćkiewicz and W. Ratajczak, “Principal components analysis (pca),” Computers &
    Geosciences, vol. 19, no. 3, pp. 303–342, 1993.
    [33] F.-R. Stöter, S. Uhlich, A. Liutkus, and Y. Mitsufuji, “Open-unmix - a reference implementation for music source separation,” Journal of Open Source Software, vol. 4, no. 41,
    p. 1667, 2019.
    [34] Z. Rafii, A. Liutkus, F.-R. Stöter, S. I. Mimilakis, and R. Bittner, The MUSDB18 corpus
    for music separation, Dec. 2017.
    [35] Y. Luo and J. Yu, “Music Source Separation with Band-split RNN,” arXiv e-prints,
    arXiv:2209.15174, arXiv:2209.15174, Sep. 2022.
    [36] D. Stoller, S. Ewert, and S. Dixon, “Wave-u-net: A multi-scale neural network for endto-end audio source separation,” arXiv preprint arXiv:1806.03185, 2018.
    [37] A. Défossez, “Hybrid spectrogram and waveform source separation,” arXiv preprint
    arXiv:2111.03600, 2021.
    [38] S. Rouard, F. Massa, and A. Défossez, “Hybrid transformers for music source separation,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP), IEEE, 2023, pp. 1–5.
    [39] M. Miron, J. Janer Mestres, and E. Gómez Gutiérrez, “Generating data to train convolutional neural networks for classical music source separation,” in Lokki T, Pätynen J,
    Välimäki V, editors. Proceedings of the 14th Sound and Music Computing Conference;
    2017 Jul 5-8; Espoo, Finland. Aalto: Aalto University; 2017. p. 227-33., Aalto University, 2017.
    [40] C.-Y. Chiu, W.-Y. Hsiao, Y.-C. Yeh, Y.-H. Yang, and A. Wen-Yu Su, “Mixing-Specific
    Data Augmentation Techniques for Improved Blind Violin/Piano Source Separation,”
    arXiv e-prints, arXiv:2008.02480, arXiv:2008.02480, Aug. 2020.
    [41] R. Hennequin, A. Khlif, F. Voituret, and M. Moussallam, “Spleeter: A fast and efficient
    music source separation tool with pre-trained models,” Journal of Open Source Software,
    vol. 5, no. 50, p. 2154, 2020.
    69
    [42] M. Heydari and Z. Duan, “Don't look back: An online beat tracking method using rnn
    and enhanced particle filtering,” in ICASSP 2021-2021 IEEE International Conference
    on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2021, pp. 236–240.
    [43] B. Di Giorgi, M. Mauch, and M. Levy, “Downbeat tracking with tempo-invariant convolutional neural networks,” arXiv preprint arXiv:2102.02282, 2021.
    [44] F. Henkel, S. Balke, M. Dorfer, and G. Widmer, “Score following as a multi-modal reinforcement learning problem.,” Trans. Int. Soc. Music. Inf. Retr., vol. 2, no. 1, pp. 67–81,
    2019.
    [45] P. Cano, A. Loscos, and J. Bonada, “Score-performance matching using hmms,” in ICMC,
    Citeseer, 1999.
    [46] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word
    recognition,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26,
    no. 1, pp. 43–49, 1978.
    [47] A. Arzt, G. Widmer, and S. Dixon, “Adaptive distance normalization for real-time music tracking,” in 2012 Proceedings of the 20th European Signal Processing Conference
    (EUSIPCO), 2012, pp. 2689–2693.
    [48] N. Takahashi, T. Yoshihisa, Y. Sakurai, and M. Kanazawa, “A parallelized data stream
    processing system using dynamic time warping distance,” in 2009 International Conference on Complex, Intelligent and Software Intensive Systems, IEEE, 2009, pp. 1100–
    1105.
    [49] I.-C. Wei and L. Su, “Online music performance tracking using parallel dynamic time
    warping,” in 2018 IEEE 20th International Workshop on Multimedia Signal Processing
    (MMSP), 2018, pp. 1–6.
    [50] S. Dixon, “Live tracking of musical performances using on-line time warping,” in Proceedings of the 8th International Conference on Digital Audio Effects, Citeseer, vol. 92,
    2005, p. 97.
    [51] A. Arzt and G. Widmer, “Towards effective ’any-time’ music tracking,” in Proceedings of the 2010 Conference on STAIRS 2010: Proceedings of the Fifth Starting AI Researchers’ Symposium, NLD: IOS Press, 2010, pp. 24–36.
    [52] Y.-J. Lin, H.-K. Kao, Y.-C. Tseng, M. Tsai, and L. Su, “A human-computer duet system
    for music performance,” in Proceedings of the 28th ACM International Conference on
    Multimedia, ser. MM '20, ACM, Oct. 2020.
    [53] Python. “Multiprocessing—process-based parallelism.” (2024), [Online]. Available: https:
    //docs.python.org/3/library/multiprocessing.html (visited on 06/02/2024).
    [54] Python. “Pyaudio package.” (2024), [Online]. Available: https://people.csail.mit.edu/
    hubert/pyaudio/ (visited on 06/02/2024).
    [55] B. McFee, M. McVicar, D. Faronbi, et al., Librosa/librosa: 0.10.2.post1, 2024.
    70
    [56] Z. K. Abdul and A. K. Al-Talabani, “Mel frequency cepstral coefficient and its applications: A review,” IEEE Access, vol. 10, pp. 122 136–122 158, 2022.
    [57] J. Thickstun, Z. Harchaoui, and S. M. Kakade, “Learning features of music from scratch,”
    in International Conference on Learning Representations (ICLR), 2017.
    [58] J. Thickstun, Z. Harchaoui, D. P. Foster, and S. M. Kakade, “Invariances and data augmentation for supervised music transcription,” in International Conference on Acoustics,
    Speech, and Signal Processing (ICASSP), 2018.
    [59] F. J. Muneratti Ortega, Expressive solo violin, 2021.
    [60] H.-W. Dong, C. Zhou, T. Berg-Kirkpatrick, and J. McAuley, Bach violin dataset, 2021.
    [61] R. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam, and J. Bello, “Medleydb: A
    multitrack dataset for annotation-intensive mir research,” Oct. 2014.
    [62] L. Yu-Jie. “音源分離資料集.” (2024), [Online]. Available: https : / / drive . google .
    com/drive/ folders/1IPGv2l - 6QjIwMtAq9m0ijQ -ilvTCFjfU?usp=sharing (visited on
    06/24/2024).
    [63] E. Vincent, R. Gribonval, and C. Févotte, “Performance measurement in blind audio
    source separation,” Audio, Speech, and Language Processing, IEEE Transactions on,
    vol. 14, pp. 1462–1469, Aug. 2006.
    [64] L. Yu-Jie. “不同旋律的音源分離結果音檔.” (2024), [Online]. Available: https : / /
    github.com/a0950088/Master/tree/main/paper/resource/4.1 (visited on 07/15/2024).
    [65] L. Yu-Jie. “系統在不同速度下的追蹤結果所使用的 midi 檔案與結果音檔.” (2024),
    [Online]. Available: https://github.com/a0950088/Master/tree/main/paper/resource/4.2.
    1 (visited on 06/11/2024).
    [66] kopikostar. “Beethoven’s ”spring sonata” op.24 allegro.” (2024), [Online]. Available:
    https://youtu.be/uDSfijK1qxo?list=PLc0i4xi7nsQRG0UTdRKRc2bxjh9pHYYJw (visited on 06/03/2024).
    [67] L. Yu-Jie. “Beethoven 音源分離結果音檔.” (2024), [Online]. Available: https://github.
    com/a0950088/Master/tree/main/paper/resource/4.2.2/music%20source%20separation
    (visited on 06/11/2024).
    [68] L. Yu-Jie. “不同特徵下的追蹤結果音檔.” (2024), [Online]. Available: https://github.
    com/a0950088/Master/tree/main/paper/resource/4.2.2 (visited on 06/11/2024).
    [69] 林承勳. “中央研究院研之有物: 今晚,想來場臨時音樂會?讓 ai 虛擬音樂家幫你
    實現!.” (2021), [Online]. Available: https://research.sinica.edu.tw/ai-virtual-musicianli-su/ (visited on 06/23/2024).

    QR CODE
    :::