跳到主要內容

簡易檢索 / 詳目顯示

研究生: 戴齊廷
Chi-ting Day
論文名稱: 基於多重時間描述之內涵式音樂檢索
Temporal Multi-Descriptors
指導教授: 張寶基
Pao-chi Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 通訊工程學系
Department of Communication Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 56
中文關鍵詞: 音樂檢索翻唱歌曲類神經網路深度學習
外文關鍵詞: Music Retrieval, Cover Song, Neural Network, Deep Learning
相關次數: 點閱:15下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著多媒體壓縮技術、行動裝置與行動網路的蓬勃發展,透過串流平台或社群網站分享、下載各種多媒體影音資料已成為日常生活的一部分。而對於不經意聽到卻感興趣的歌曲,內涵式音樂檢索(Content Based Music Retrieval, CBMR)可直接利用歌曲內容如旋律、音色等特徵做為檢索依據,避免使用者無法描述其關鍵字或標注錯誤的情況。
    面對大量的檢索資料庫所耗費的大量比對時間,本研究提出以稀疏自編碼器(Sparse Auto Encoder, SAE)將片段時間的音訊Chroma特徵轉換為資訊含量較高的描述元(Descriptor),藉由學習找出相對關鍵的特徵增加檢索效能,並降低比對的特徵數量減少比對時間。實驗結果顯示,本研究提出之方法不僅節省50%以上的時間,也大幅提升MRR值,說明長時間的特徵更能描述歌曲檢索資訊。


    Nowadays, sharing or downloading multimedia resources from the internet has become part of our daily life. However, it is hard to find the particular music in such a tremendous amount of data on internet when it comes to searching the music with limited information. The Content Based Music Retrieval (CBMR) can direct get the desired music by using features extracted from the content as the keywords for searching.
    To deal with massive retrieval data, we use Chroma clip as input for the Sparse Auto Encoder (SAE) transferring feature to Descriptor before matching to reduce feature’s quantity, and learning which parts is more important for the input data. The experiment results show that our method provide over 50% matching time reduction and higher MRR compared with traditional approach.

    摘 要 I Abstract II 致 謝 III 目 錄 IV 附圖索引 VI 附表索引 VIII 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 2 1.3 論文架構 2 第二章 內涵式音樂檢索簡介 3 2.1 音樂檢索及其特性元素簡介 3 2.1.1 音樂特性元素 4 2.1.2 翻唱歌曲辨識及相關研究 5 2.2 檢索特徵 8 2.3 比對方法 10 2.3.1 Optimal Transposition Index 10 2.3.2動態時間扭曲 11 第三章 類神經網路 14 3.1 類神經網路 15 3.2 深度神經網路 20 3.2.1摺積神經網路 22 3.2.2稀疏自編碼器 23 第四章 多重時間描述分析 25 4.1 系統架構 25 4.2 實驗數據及參數分析 31 4.2.1 特徵轉換 32 4.2.2 檢索系統效能 35 4.2.3 多重時間之檢索系統 38 第五章 結論及未來展望 41 參考文獻 42

    [1] M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-Based Music Information Retrieval: Current Directions and Feature Challenges,” in Proc. of the IEEE, vol. 96 no. 4, pp. 668-696, April 2008.
    [2] 侯志欽,聲學原理與多媒體音訊科技,初版,台灣商務印書館,台北市,民國九十六年。
    [3] 陳仁寬,樂理入門與指導,初版,五洲出版有限公司,台北市,民國八十五年。
    [4] Music Information Retrieval Evaluation eXchange,
    http://www.music-ir.org/mirex/wiki/2006:Main_Page
    [5] J. Serra, E. Gomez, and P. Herrera, “Audio cover song identification and similarity: background, approaches, evaluation, and beyond,” Advances in Music Information Retrieval, vol. 274, ch. 14, pp. 307-332, March 2010.
    [6] D. P. W. Ellis, and G.E. Poliner, “Identifying ‘Cover Songs’ with Chroma Features and Dynamic Programming Beat Tracking,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, U.S.A., pp. 1429-1432, April 15-20, 2007.
    [7] K. Lee, “Identifying Cover Songs from Audio Using Harmonic Representation,” extended abstract submitted to MIREX (Music Information Retrieval Evaluation eXchange) task on Audio Cover Song Identification, 2006.
    [8] C. Sailer, and Karin Dressler, “Finding cover songs by melodic similarity,” extended abstract submitted to MIREX (Music Information Retrieval Evaluation eXchange) task on Audio Cover Song Identification, 2006.
    [9] D. P. W. Ellis, and C. Cotton, “THE 2007 LABROSA COVER SONG DETECTION SYSTEM,” extended abstract submitted to MIREX (Music Information Retrieval Evaluation eXchange) task on Audio Cover Song Identification, 2006.
    [10] J. Serra, and E. Gomez, “Audio cover song identification based on tonal sequence alignment,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Las Vegas, Nevada, U.S.A., pp.61-64, March 30- April 4, 2008.
    [11] S. Ravuri, and D. P. W. Ellis, “Cover song detection: From high scores to general classification,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Dallas, Texas, U.S.A., pp. 65-68, March 14-19, 2010.
    [12] J. Serra, “Music similarity based on sequences of descriptors: tonal features applied to audio cover song identification,” M.S. thesis, MTG, Universitat Pompeu Fabra, Barcelona, Spain, 2007.
    [13] 謝佳斌,AAC壓縮域翻唱歌曲辨識系統。中央大學通訊工程學系碩士學位論文,2012。
    [14] 莊詠婷,利用AAC壓縮域特徵之古典樂翻奏曲檢索系統。中央大學通訊工程學系碩士學位論文,2013。
    [15] E. Keogh, C. A. Ratanamahatana, “Exact indexing of dynamic time warping,” Knowledge and Information Systems, 2004.
    [16] Yue Liu, and Hui Liu, and Bofeng Zhang and Gengfeng Wu, “Extraction of if-then rules from trained neural network and its application to earthquake prediction,” Cognitive Informatics, 2004. Proceedings of the Third IEEE International Conference.
    [17] T. Kondo, J. Ueno, and S. Takao, “Medical image diagnosis of lung cancer by revised GMDH-type neural network self-selecting optimum neuron architectures,” System Integration (SII), IEEE/SICE International Symposium, 2011.
    [18] N. L. D. Khoa, K. Sakakibara, and I. Nishikawa, “Stock Price Forecasting using Back Propagation Neural Networks with Time and Profit Based Adjusted Weight Factors,” SICE-ICASE, International Joint Conference, 2006.
    [19] G. E. Hinton, and R. R , Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006.
    [20] http://www.ling.fju.edu.tw/hearing/brain-into.htm
    [21] D. E. Rumelhart, G. E. Hinton, R. J. Williams, “Learning representations by back-propagating errors,” Nature 323 (6088): 533–536, 8 October 1986.
    [22] http://www.nature.com/news/computer-science-the-learning-machines-1.14481#/b1
    [23] D. H. Ackley, G. E. Hinton, T. J. Sejnowski, “A Learning Algorithm for Boltzmann Machines,” In D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations (Cambridge: MIT Press): 282–317. 1985.
    [24] P. Smolensky, Parallel Distributed Processing: Volume 1:Foundations, D. E. Rumelhart, J. L. McClelland, Eds. (MIT Press, Cambridge, 1986), pp. 194–281
    [25] A. Mnih, and G. E. Hinton, “Learning Unreliable Constraints using Contrastive Divergence,” In IJCNN 2005, Montreal.
    [26] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, “Greedy Layer-Wise Training of Deep Networks,” Advances in Neural Information Processing Systems 19, 2007.
    [27] G. Casella, E. I. George, “Explaining the Gibbs Sampler,” The American Statistician 46 (3): 167, 1992.
    [28] V. Nair, and G. E. Hinton, “3-D Object recognition with deep belief nets,” Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. lafferty, C. K. I. Williams, and A. Culotta (Eds.), pp 1339-1347.
    [29] A. R. Mohamed, G. E. Dahl, and G. E. Hinton, “Deep belief networks for phone recognition,” NIPS 22 workshop on Deep Learning for Speech Recognition.
    [30] G. E. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, Navdeep Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep Neural Networks for Acoustic Modeling in Speech Recognition,” IEEE Signal Processing Magazine, November, 2012.
    [31] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
    [32] I. Mrazova, M. Kukacka, “Hybrid convolutional neural networks,” Industrial Informatics INDIN 2008. 6th IEEE International Conference, 2008.
    [33] C. Neubauer, “Evaluation of convolutional neural networks for visual recognition,” IEEE Transactions on Neural Networks, VOL. 9, NO. 4, July 1998
    [34] Andrew Ng, “Sparse Autoencoder,” Lecture notes. Deep Learning and Unsupervised Feature Learning, Winter, 2011
    [35] Matlab Central, Deep Learning Toolbox,
    http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox
    [36] The Covers80 cover song data set,
    http://labrosa.ee.columbia.edu/projects/coversongs/covers80/

    QR CODE
    :::