基於動態時間規整結合線性伸縮之哼唱檢索系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	李欣成 Hsin-Cheng Lee
論文名稱：	基於動態時間規整結合線性伸縮之哼唱檢索系統
指導教授：	張寶基 Pao-Chi Chang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 通訊工程學系 Department of Communication Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	74
中文關鍵詞：	音樂資訊檢索、哼唱檢索、動態時間規整、線性伸縮
外文關鍵詞：	Music Retrieval, Query by singing and humming, Dynamic Time Warping, Linear Scaling
相關次數：	點閱：13 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著智慧型手機以及行動網路的普及，使用音樂串流平台或社群網站搜索、下載音樂資料成為日常生活的一部分。對於腦海中有某首歌的旋律卻想不出歌名、歌詞的情況，內涵式音樂檢索系統(Content Based Music Retrieval, CBMR)如哼唱檢索系統就可直接使用歌曲內涵式特徵如旋律、節奏等做為檢索依據，解決上述問題。
對於較大的檢索資料庫，我們必須在辨識準確率與運算時間中做權衡，先前研究以線性伸縮(Linear Scaling, LS)或推土機距離(Earth Mover’s distance, EMD)等準確度較低但運算速度快的方法先過濾掉不相似歌曲，再以高複雜度的動態時間規整(Dynamic Time Warping, DTW)對剩餘歌曲做高精度比對，最後做相似度融合並輸出前十相似的歌曲清單。本研究將LS用在縮短旋律特徵以及在優化模塊中當作微調特徵長度的工具，前者可以減少運算量後者可以使準確度再上升，高精度的DTW則負責計算匹配距離，在歌曲中移動及伸縮匹配窗口找出對應的旋律起始點以及最佳匹配距離，並設計優化模塊及前置過濾器。實驗結果顯示，本系統的MRR值優於先前的研究，線性縮短資料與優化閥值節省了約40%運算時間。

With the growing popularity of mobile device and internet service, the amount of music data distributed over the internet are increasing everyday. Music information retrieval systems that have capabilities to search for music accurately and quickly are getting more and more attention.
Sometimes users only remember the melody but forget the lyrics, the Content Based Music Retrieval (CBMR) system like QBSH can solve this problem by using features extracted from the music for searching.
To deal with massive retrieval data, we need to balance accuracy and computation time, previous research combines multiple classifiers using score level fusion to reduce computation time, but poor classifier would lead to poor accuracy. Instead of score level fusion method, our proposed system combines DTW and linear scaling(LS), using LS to shorten query and reference songs in different procedure to reduce computation time and using DTW to compute the similarity between query and reference songs. we also design refinement module and pre-filter to enhance the accuracy.
The experiment results show that our method provide higher MRR compared with previous approach, and we reduce about 40% computation time by scaling down the data and finding best threshold setting.

目　錄
摘　要 I
Abstract II
致  謝 III
目　錄 Ⅴ
附圖索引 Ⅶ
附表索引 Ⅸ
第一章 緒論 1
1　研究背景 1
2　研究動機與目的 2
3　論文架構 2
第二章 音樂資訊檢索 3
1　音樂檢索的種類及特性 3
1.1 哼唱檢索現況 4
2　哼唱檢索框架 5
2.1 哼唱檢索相關研究 8
2.2 哼唱檢索特徵 9
2.3 MIDI檔案格式介紹 9
3　特徵提取方法 10
3.1 基頻轉半音 14
3.2 空間濾波 15
3.3 平滑化處裡(Smoothing) 17
3.4 匹配效率 19
3.5 向量距離 19
第三章 旋律特徵匹配演算法 24
1　音調平移(Key transposition) 24
2　線性伸縮(Linear Scaling, LS) 27
3　推土機移動距離(Earth Mover’s Distance) 29
3　動態時間規整(Dynamic Time Warping, DTW) 31
第四章　提出之系統架構 38
1　前處理(Preprocessing) 39
2　匹配階段 41
3　匹配流程 42
第五章　實驗結果分析 48
1　實驗環境 48
1.1 實驗資料集    49
1.2 效能評估方法 50
2　實驗設計 50
2.1 音高追蹤器參數設定 51
2.2 優化模塊(Refinement module)閥值選擇 51
2.3 前置過濾器(Pre-filter)閥值選擇 53
2.4 最終參數設定(Final setting) 54
2.5 線性伸縮對運算時間之影響 54
2.6 資料集二之檢索數據 55
3　實驗結果比較與分析 56
第六章　結論與未來展望 57
參考文獻 59
                                

[1] T. Bertin-Mahieux and D. P. Ellis, "Large-scale cover song recognition using hashed chroma landmarks," in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 201l.
[2] Humphrey, Eric J, Juan P. Bello, and Yann LeCun, "Feature learning and deep architectures: new directions for music informatics", Journal of Intelligent Information Systems, pp. 461-4814. 2013.
[3] W.-H. Tsai and H.-M. Wang, “Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 330–341, 2006.
[4] M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-Based Music Information Retrieval: Current Directions and Feature Challenges,” in Proc. of the IEEE, vol. 96 no. 4, pp. 668-696, April 2008.
[5] R.J. McNab, L.A. Smith, I.H. Witten, C.L. Henderson, and S.J. Cunningham, “Towards the Digital Music Library: Tune Retrieval from Acoustic Input,” Proc. First ACM Int’l Conf. Digital Libraries, pp. 11-18, 1996.
[6] J.-S. R. Jang, H.-R. Lee, and M.-Y. Kao, “Content-based music retrieval using linear scaling and branch-and-bound tree search,” in Proc. IEEE Int. Conf. Multimedia and Expo, Tokyo, Japan, Aug. 2001, pp. 289–292.
[7] MIR-QBSH corpus,
http://mirlab.org/dataSet/public/MIR-QBSH-corpus.rar
[8] M. Ryynänen and A. Klapuri, “Query by humming of MIDI and audio using locality sensitive hashing,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2008, pp. 2249–2252.
[9] G. P. Nam, K. R. Park, S.-J. Park, T. T. T. Luong, and H. H. Nam, “Intelligent query by humming system based on score level fusion of multiple classifiers,” EURASIP J. Advances in Signal Process., in submission
[10] G. P. Nam, K. R. Park, S.-J. Park, S.-P. Lee, and M. Y. Kim, “A new query by humming system based on the score level fusion of two classifiers,” Int J Comm Syst 2012.
[11] G. P. Nam and K. R. Park, "Fast query-by-singing/humming system that combines linear scaling and quantized dynamic time warping algorithm,“ IJDSN 2015
[12] G. P. Nam and K. R. Park, "Multi-Classifier Based on a Query-by-Singing/Humming System," Computer Science Symmetry 2015
[13] C. W. Lin, J. J. Ding, and C. M. Hu, “Advanced query by humming system using diffused hidden Markov model and tempo based dynamic programming,” APSIPA ASC 2016
[14] A. Lv and G. Liu, “AnEffective Design for Fast Query-by-Humming System with Melody Segmentation and Feature Extraction,” ICCSEC 2017.
[15] B. Stasiak, “Follow That Tune – Dynamic Time Warping Refinement for Query by Humming,” Proc. Of Joint Conference NTAV/SPA 2012.
[16] J. Q. Sun and S. P. Lee, “Query by singing/humming system based on deep learning,” IJAER 2017.
[17] Y. Rubner, C. Tomasi, and L.J. Guibas, “A metric for distributions with applications to image databases,” in Proc. IEEE Int. Conf. Computer Vision, 1998.
[18] H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition", IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-26, pp. 43-49, Feb. 1978.
[19] Essen Associative Code and Folksong Database:
http://www.esac-data.org./
[20] J. Serra, “Music similarity based on sequences of descriptors: tonal features applied to audio cover song identiﬁcation,” M.S. thesis, MTG, Universitat Pompeu Fabra, Barcelona, Spain, 2007.
[21] E. Keogh and C. A. Ratanamahatana, “Exact indexing of dynamic time warping,” Knowledge and Information Systems, 2004.
[22] Ranjani, S. Sri, et al. "Application of SHAZAM-Based Audio Fingerprinting for Multilingual Indian Song Retrieval", Advances in Communication and Computing, pp. 81-92, Springer India, 2015
[23] Tzanetakis, George, Andrey Ermolinskyi, and Perry Cook, "Pitch histograms in audio and symbolic music information retrieval", Journal of New Music Research, pp. 143-152, 2003.
[24] H. M. Yu, W. H. Tsai, and H. M. Wang, “A query-by-Singing system for retrieving Karaoke music,” IEEE Trans. Multimedia, vol. 10, no. 8, pp. 1626–1637, Dec. 2008.
[25] Z. Guo, Q. Wang, G. Liu, and J. Guo, “A query by humming system based on locality sensitive hashing indexes,” Signal Process., 2012.
[26] W. H. Tsai, H. M. Yu, and H. M. Wang, “A query-by-example technique for retrieving cover versions of popular songs with similar melodies,” in Proc. ISMIR, 2005, pp. 183–190.
[27] Chai-Jong Song, Hochong Park, Chang-Mo Yang, Sei-Jin Jang, and SeokPhil Lee, “Implementation of a practical query-by-singing/humming (QbSH) system and its commercial applications”, IEEE International Conference on ConsumerEelectronics, pp. 104-105, Jan. 2012.
[28] C.-C. Wang, J.-S. R. Jang, and W. Wang, “An improved query by singing/humming system using melody and lyrics information,” in Proc. Int. Society for Music Information Retrieval Conf., pp. 45–50, 2010.
[29] 戴齊廷，基於多重時間描述之內涵式音樂檢索。中央大學通訊工程學系碩士學位論文，2014。
[30] 黃梓翔，基於機器學習方法之巨量音樂檢索系統。中央大學通訊工程學系碩士學位論文，2016。

簡易檢索 / 詳目顯示

相關論文