| 研究生: |
李欣成 Hsin-Cheng Lee |
|---|---|
| 論文名稱: |
基於動態時間規整結合線性伸縮之哼唱檢索系統 |
| 指導教授: |
張寶基
Pao-Chi Chang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 通訊工程學系 Department of Communication Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 74 |
| 中文關鍵詞: | 音樂資訊檢索 、哼唱檢索 、動態時間規整 、線性伸縮 |
| 外文關鍵詞: | Music Retrieval, Query by singing and humming, Dynamic Time Warping, Linear Scaling |
| 相關次數: | 點閱:13 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著智慧型手機以及行動網路的普及,使用音樂串流平台或社群網站搜索、下載音樂資料成為日常生活的一部分。對於腦海中有某首歌的旋律卻想不出歌名、歌詞的情況,內涵式音樂檢索系統(Content Based Music Retrieval, CBMR)如哼唱檢索系統就可直接使用歌曲內涵式特徵如旋律、節奏等做為檢索依據,解決上述問題。
對於較大的檢索資料庫,我們必須在辨識準確率與運算時間中做權衡,先前研究以線性伸縮(Linear Scaling, LS)或推土機距離(Earth Mover’s distance, EMD)等準確度較低但運算速度快的方法先過濾掉不相似歌曲,再以高複雜度的動態時間規整(Dynamic Time Warping, DTW)對剩餘歌曲做高精度比對,最後做相似度融合並輸出前十相似的歌曲清單。本研究將LS用在縮短旋律特徵以及在優化模塊中當作微調特徵長度的工具,前者可以減少運算量後者可以使準確度再上升,高精度的DTW則負責計算匹配距離,在歌曲中移動及伸縮匹配窗口找出對應的旋律起始點以及最佳匹配距離,並設計優化模塊及前置過濾器。實驗結果顯示,本系統的MRR值優於先前的研究,線性縮短資料與優化閥值節省了約40%運算時間。
With the growing popularity of mobile device and internet service, the amount of music data distributed over the internet are increasing everyday. Music information retrieval systems that have capabilities to search for music accurately and quickly are getting more and more attention.
Sometimes users only remember the melody but forget the lyrics, the Content Based Music Retrieval (CBMR) system like QBSH can solve this problem by using features extracted from the music for searching.
To deal with massive retrieval data, we need to balance accuracy and computation time, previous research combines multiple classifiers using score level fusion to reduce computation time, but poor classifier would lead to poor accuracy. Instead of score level fusion method, our proposed system combines DTW and linear scaling(LS), using LS to shorten query and reference songs in different procedure to reduce computation time and using DTW to compute the similarity between query and reference songs. we also design refinement module and pre-filter to enhance the accuracy.
The experiment results show that our method provide higher MRR compared with previous approach, and we reduce about 40% computation time by scaling down the data and finding best threshold setting.
[1] T. Bertin-Mahieux and D. P. Ellis, "Large-scale cover song recognition using hashed chroma landmarks," in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 201l.
[2] Humphrey, Eric J, Juan P. Bello, and Yann LeCun, "Feature learning and deep architectures: new directions for music informatics", Journal of Intelligent Information Systems, pp. 461-4814. 2013.
[3] W.-H. Tsai and H.-M. Wang, “Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 330–341, 2006.
[4] M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-Based Music Information Retrieval: Current Directions and Feature Challenges,” in Proc. of the IEEE, vol. 96 no. 4, pp. 668-696, April 2008.
[5] R.J. McNab, L.A. Smith, I.H. Witten, C.L. Henderson, and S.J. Cunningham, “Towards the Digital Music Library: Tune Retrieval from Acoustic Input,” Proc. First ACM Int’l Conf. Digital Libraries, pp. 11-18, 1996.
[6] J.-S. R. Jang, H.-R. Lee, and M.-Y. Kao, “Content-based music retrieval using linear scaling and branch-and-bound tree search,” in Proc. IEEE Int. Conf. Multimedia and Expo, Tokyo, Japan, Aug. 2001, pp. 289–292.
[7] MIR-QBSH corpus,
http://mirlab.org/dataSet/public/MIR-QBSH-corpus.rar
[8] M. Ryynänen and A. Klapuri, “Query by humming of MIDI and audio using locality sensitive hashing,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2008, pp. 2249–2252.
[9] G. P. Nam, K. R. Park, S.-J. Park, T. T. T. Luong, and H. H. Nam, “Intelligent query by humming system based on score level fusion of multiple classifiers,” EURASIP J. Advances in Signal Process., in submission
[10] G. P. Nam, K. R. Park, S.-J. Park, S.-P. Lee, and M. Y. Kim, “A new query by humming system based on the score level fusion of two classifiers,” Int J Comm Syst 2012.
[11] G. P. Nam and K. R. Park, "Fast query-by-singing/humming system that combines linear scaling and quantized dynamic time warping algorithm,“ IJDSN 2015
[12] G. P. Nam and K. R. Park, "Multi-Classifier Based on a Query-by-Singing/Humming System," Computer Science Symmetry 2015
[13] C. W. Lin, J. J. Ding, and C. M. Hu, “Advanced query by humming system using diffused hidden Markov model and tempo based dynamic programming,” APSIPA ASC 2016
[14] A. Lv and G. Liu, “AnEffective Design for Fast Query-by-Humming System with Melody Segmentation and Feature Extraction,” ICCSEC 2017.
[15] B. Stasiak, “Follow That Tune – Dynamic Time Warping Refinement for Query by Humming,” Proc. Of Joint Conference NTAV/SPA 2012.
[16] J. Q. Sun and S. P. Lee, “Query by singing/humming system based on deep learning,” IJAER 2017.
[17] Y. Rubner, C. Tomasi, and L.J. Guibas, “A metric for distributions with applications to image databases,” in Proc. IEEE Int. Conf. Computer Vision, 1998.
[18] H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition", IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-26, pp. 43-49, Feb. 1978.
[19] Essen Associative Code and Folksong Database:
http://www.esac-data.org./
[20] J. Serra, “Music similarity based on sequences of descriptors: tonal features applied to audio cover song identification,” M.S. thesis, MTG, Universitat Pompeu Fabra, Barcelona, Spain, 2007.
[21] E. Keogh and C. A. Ratanamahatana, “Exact indexing of dynamic time warping,” Knowledge and Information Systems, 2004.
[22] Ranjani, S. Sri, et al. "Application of SHAZAM-Based Audio Fingerprinting for Multilingual Indian Song Retrieval", Advances in Communication and Computing, pp. 81-92, Springer India, 2015
[23] Tzanetakis, George, Andrey Ermolinskyi, and Perry Cook, "Pitch histograms in audio and symbolic music information retrieval", Journal of New Music Research, pp. 143-152, 2003.
[24] H. M. Yu, W. H. Tsai, and H. M. Wang, “A query-by-Singing system for retrieving Karaoke music,” IEEE Trans. Multimedia, vol. 10, no. 8, pp. 1626–1637, Dec. 2008.
[25] Z. Guo, Q. Wang, G. Liu, and J. Guo, “A query by humming system based on locality sensitive hashing indexes,” Signal Process., 2012.
[26] W. H. Tsai, H. M. Yu, and H. M. Wang, “A query-by-example technique for retrieving cover versions of popular songs with similar melodies,” in Proc. ISMIR, 2005, pp. 183–190.
[27] Chai-Jong Song, Hochong Park, Chang-Mo Yang, Sei-Jin Jang, and SeokPhil Lee, “Implementation of a practical query-by-singing/humming (QbSH) system and its commercial applications”, IEEE International Conference on ConsumerEelectronics, pp. 104-105, Jan. 2012.
[28] C.-C. Wang, J.-S. R. Jang, and W. Wang, “An improved query by singing/humming system using melody and lyrics information,” in Proc. Int. Society for Music Information Retrieval Conf., pp. 45–50, 2010.
[29] 戴齊廷,基於多重時間描述之內涵式音樂檢索。中央大學通訊工程學系碩士學位論文,2014。
[30] 黃梓翔,基於機器學習方法之巨量音樂檢索系統。中央大學通訊工程學系碩士學位論文,2016。