跳到主要內容

簡易檢索 / 詳目顯示

研究生: 黃梓翔
Tzu-Hsiang Huang
論文名稱: 基於機器學習方法之巨量音樂檢索系統
Large-Scale Music Retrieval System Using Machine Learning Approaches
指導教授: 張寶基
Pao-Chi Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 通訊工程學系
Department of Communication Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 96
中文關鍵詞: 音樂資訊檢索翻唱歌曲辨識二維傅立葉轉換機器學習
外文關鍵詞: Music information retrieval, Cover song identification, 2D-Fourier transform, Machine learning
相關次數: 點閱:14下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在大數據的時代中,網際網路上的多媒體資訊量以指數性成長,如何正確地尋找特定多媒體資訊成為一個重要的研究議題。
    本系統參考翻唱歌曲辨識的理論架構,利用歌曲的音樂內涵式特徵,消除不同樂器、語言、歌手等等演奏時的音色、調性與些微結構差異,尋找資料庫中與輸入歌曲俱有相似旋律特徵的歌曲。
    在內涵式音樂檢索領域中,由於不同歌曲的時間長度不一,先前的研究以輸入歌曲對整個資料庫的歌曲進行高複雜度的比對來計算兩首歌曲的相似度,最後輸出資料庫中相似度最高的歌曲清單,這種方法雖然盡可能提升辨識正確率,但是消耗過多的運算資源,在大規模的資料庫並不可行。本研究提出在大規模資料庫中快速檢索特定相似歌曲的系統,系統擷取音樂的頻譜特徵並以二維傅立葉轉換壓縮資料,接著合併成固定長度的向量,再以K-Means、主成份分析、線性判別分析等機器學習的方式強化向量的模式特徵,藉此將資料庫的全部歌曲投影到一個向量空間,系統直接比對查詢歌曲與資料庫歌曲的向量距離,將相似度最高的音樂作為回饋歌單。本系統不僅大幅度地提升內涵式音樂檢索的效率,更探討音樂檢索結合機器學習的潛力。


    In this work, we proposed a music retrieval system which can search the similar music in large-scale database.
    Large-scale similar music recognition should calculate song-to-song simi-larity that can accommodate differences in timing, key and tempo. Simple vector distance measure is not powerful enough to perform the similar music recogni-tion task, but expensive solutions such as dynamic time warping do not scale to millions of instances, making the similar music recognition inappropriate for commercial-scale application. In this work, we used the content-based music features of songs as input and transformed them into semantic vectors by 2D-Fourier transform. We even explored different machine learning approaches to learn and reinforce the pattern of these semantic vector. By projecting the songs into the sematic vector space, we can use the efficient nearest neighbor algorithm to compare the similarity of songs and retrieve the most similar songs in the large-scale database.
    The proposed system is not only efficient enough to perform scalable con-tent-based music retrieval, but also develop the potential of machine learning approaches, making the similar music recognition application more fast and accurate.

    摘要 i Abstract ii 誌謝 iii 目錄 v 圖目錄 viii 表目錄 xi 第一章 緒論 1 1-1 研究背景 1 1-2 研究動機與目的 2 1-3 論文架構 3 第二章 音樂資訊檢索 4 2-1 音樂檢索特徵 4 2-1-1 低階特徵 5 2-1-2 中階特徵 7 2-1-3 二維傅立葉轉換 9 2-2 翻唱歌曲辨識 12 2-2-1 翻唱歌曲的類型與音樂特性 13 2-2-2 翻唱歌曲辨識方法 16 2-3 音樂特徵比對方法 17 2-3-1 歐式距離 19 2-3-2 曼哈頓距離 20 2-3-3 餘弦距離 21 第三章 機器學習 22 3-1 K平均演算法 24 3-2 主成份分析 27 3-3 線性判別分析 29 3-4 最近鄰居分類 31 3-4-1 搜尋演算法 33 第四章 提出之架構 36 4-1 特徵擷取 37 4-2 特徵前處理 38 4-3 特徵學習與轉換 45 4-4 檢索系統 49 第五章 實驗與分析 50 5-1 實驗環境 50 5-1-1 實驗資料庫 51 5-1-2 效能評估方法 53 5-1-3 實驗設計 56 5-2 二元選擇實驗 57 5-2-1 參數選擇 58 5-3 檢索實驗 60 5-3-1 訓練集參數選擇 61 5-3-2 測試集參數選擇 65 5-3-3 大規模資料庫參數選擇 70 5-4 實驗結果比較與分析 73 第六章 結論與未來展望 77 參考文獻 78

    [1] Serra, Joan, Emilia Gómez, and Perfecto Herrera. "Audio cover song iden-tification and similarity: background, approaches, evaluation, and beyond", Advances in Music Information Retrieval, pp. 307-332, Springer Berlin Heidelberg, 2010.
    [2] Tzanetakis, George, Andrey Ermolinskyi, and Perry Cook, "Pitch histograms in audio and symbolic music information retrieval", Journal of New Music Research, pp. 143-152, 2003.
    [3] T. Bertin-Mahieux and D. P. W. Ellis, "Large-scale cover song recognition using hashed chroma landmarks", 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 117-120, New Paltz, NY, 2011.
    [4] Bertin-Mahieux, Thierry, and Daniel PW Ellis, "Large-Scale Cover Song Recognition Using the 2D Fourier Transform Magnitude.", International So-ciety for Music Information Retrieval Conference (ISMIR), 2012.
    [5] Khadkevich, Maksim, and Maurizio Omologo, "Large-Scale Cover Song Identification Using Chord Profiles.", International Society for Music In-formation Retrieval Conference (ISMIR), 2013.
    [6] M. Marolt, "A Mid-Level Representation for Melody-Based Retrieval in Audio Collections," in IEEE Transactions on Multimedia, vol. 10, no. 8, pp. 1617-1625, Dec. 2008.
    [7] Schmidt, Erik, and Youngmoo Kim, "Learning Rhythm And Melody Features With Deep Belief Networks", International Society for Music In-formation Retrieval Conference (ISMIR), 2013.
    [8] O. Nieto and J. P. Bello, "Music segment similarity using 2D-Fourier Magni-tude Coefficients," 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 664-668, Florence, 2014.
    [9] 林銀議,信號與系統,二版,五南圖書出版股份有限公司,台北市,2009年。
    [10] Slaney, Malcolm, Kilian Weinberger, and William White, "Learning a met-ric for music similarity." International Symposium on Music Information Retrieval (ISMIR). 2008.
    [11] J. Schluter and C. Osendorfer, "Music Similarity Estimation with the Mean-Covariance Restricted Boltzmann Machine", Machine Learning and Applications and Workshops (ICMLA), pp. 118-123, 2011 10th International Conference on, Honolulu, HI, 2011.
    [12] J. Stephen Downie: MIREX 2006:Audio Cover Song. 2006, from http://www.music-ir.org/mirex/wiki/2006:Audio_Cover_Song
    [13] Ranjani, S. Sri, et al. "Application of SHAZAM-Based Audio Finger-printing for Multilingual Indian Song Retrieval", Advances in Communi-cation and Computing, pp. 81-92, Springer India, 2015.
    [14] Bertin-Mahieux, Thierry, et al. "The million song dataset", International Society for Music Information Retrieval Conference (ISMIR). Vol. 2. No. 9. 2011.
    [15] Pedregosa, Fabian, et al, "Scikit-learn: Machine learning in Python", Journal of Machine Learning Research, pp. 2825-2830, 12, Oct, 2011.
    [16] E. J. Humphrey, J. P. Bello, and Y. LeCun, “Moving beyond feature design: Deep architectures and automatic feature learning in music informatics”, International Society for Music Information Retrieval Conference (ISMIR), Porto, Portugal, October 2012.
    [17] Honglak Lee, Peter Pham, Yan Largman, and Andrew Ng, “Unsupervised feature learning for audio classification using convolutional deep belief networks”, Advances in Neural Information Processing Systems, 22. 2009.
    [18] Hamel, Philippe, and Douglas Eck, "Learning Features from Music Audio with Deep Belief Networks", International Society for Music Information Retrieval Conference (ISMIR), 2010.
    [19] Humphrey, Eric J., Juan P. Bello, and Yann LeCun, "Feature learning and deep architectures: new directions for music informatics", Journal of Intel-ligent Information Systems, pp. 461-4814. 2013.
    [20] Y. Kim, H. Lee and E. M. Provost, "Deep learning for robust feature generation in audiovisual emotion recognition", 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3687-3691, Vancouver, BC, 2013.
    [21] Dieleman, Sander, and Benjamin Schrauwen, "Multiscale approaches to music audio feature learning", International Society for Music Information Retrieval Conference (ISMIR), Pontifícia Universidade Católica do Paraná, 2013.
    [22] S. Dieleman and B. Schrauwen, "End-to-end learning for music audio" ,2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6964-6968, Florence, 2014..
    [23] Coates, Adam, Honglak Lee, and Andrew Y. Ng, "An analysis of sin-gle-layer networks in unsupervised feature learning", Ann Arbor, 2010.

    QR CODE
    :::