| 研究生: |
戴齊廷 Chi-ting Day |
|---|---|
| 論文名稱: |
基於多重時間描述之內涵式音樂檢索 Temporal Multi-Descriptors |
| 指導教授: |
張寶基
Pao-chi Chang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 通訊工程學系 Department of Communication Engineering |
| 論文出版年: | 2014 |
| 畢業學年度: | 102 |
| 語文別: | 中文 |
| 論文頁數: | 56 |
| 中文關鍵詞: | 音樂檢索 、翻唱歌曲 、類神經網路 、深度學習 |
| 外文關鍵詞: | Music Retrieval, Cover Song, Neural Network, Deep Learning |
| 相關次數: | 點閱:16 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著多媒體壓縮技術、行動裝置與行動網路的蓬勃發展,透過串流平台或社群網站分享、下載各種多媒體影音資料已成為日常生活的一部分。而對於不經意聽到卻感興趣的歌曲,內涵式音樂檢索(Content Based Music Retrieval, CBMR)可直接利用歌曲內容如旋律、音色等特徵做為檢索依據,避免使用者無法描述其關鍵字或標注錯誤的情況。
面對大量的檢索資料庫所耗費的大量比對時間,本研究提出以稀疏自編碼器(Sparse Auto Encoder, SAE)將片段時間的音訊Chroma特徵轉換為資訊含量較高的描述元(Descriptor),藉由學習找出相對關鍵的特徵增加檢索效能,並降低比對的特徵數量減少比對時間。實驗結果顯示,本研究提出之方法不僅節省50%以上的時間,也大幅提升MRR值,說明長時間的特徵更能描述歌曲檢索資訊。
Nowadays, sharing or downloading multimedia resources from the internet has become part of our daily life. However, it is hard to find the particular music in such a tremendous amount of data on internet when it comes to searching the music with limited information. The Content Based Music Retrieval (CBMR) can direct get the desired music by using features extracted from the content as the keywords for searching.
To deal with massive retrieval data, we use Chroma clip as input for the Sparse Auto Encoder (SAE) transferring feature to Descriptor before matching to reduce feature’s quantity, and learning which parts is more important for the input data. The experiment results show that our method provide over 50% matching time reduction and higher MRR compared with traditional approach.
[1] M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-Based Music Information Retrieval: Current Directions and Feature Challenges,” in Proc. of the IEEE, vol. 96 no. 4, pp. 668-696, April 2008.
[2] 侯志欽,聲學原理與多媒體音訊科技,初版,台灣商務印書館,台北市,民國九十六年。
[3] 陳仁寬,樂理入門與指導,初版,五洲出版有限公司,台北市,民國八十五年。
[4] Music Information Retrieval Evaluation eXchange,
http://www.music-ir.org/mirex/wiki/2006:Main_Page
[5] J. Serra, E. Gomez, and P. Herrera, “Audio cover song identification and similarity: background, approaches, evaluation, and beyond,” Advances in Music Information Retrieval, vol. 274, ch. 14, pp. 307-332, March 2010.
[6] D. P. W. Ellis, and G.E. Poliner, “Identifying ‘Cover Songs’ with Chroma Features and Dynamic Programming Beat Tracking,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, U.S.A., pp. 1429-1432, April 15-20, 2007.
[7] K. Lee, “Identifying Cover Songs from Audio Using Harmonic Representation,” extended abstract submitted to MIREX (Music Information Retrieval Evaluation eXchange) task on Audio Cover Song Identification, 2006.
[8] C. Sailer, and Karin Dressler, “Finding cover songs by melodic similarity,” extended abstract submitted to MIREX (Music Information Retrieval Evaluation eXchange) task on Audio Cover Song Identification, 2006.
[9] D. P. W. Ellis, and C. Cotton, “THE 2007 LABROSA COVER SONG DETECTION SYSTEM,” extended abstract submitted to MIREX (Music Information Retrieval Evaluation eXchange) task on Audio Cover Song Identification, 2006.
[10] J. Serra, and E. Gomez, “Audio cover song identification based on tonal sequence alignment,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Las Vegas, Nevada, U.S.A., pp.61-64, March 30- April 4, 2008.
[11] S. Ravuri, and D. P. W. Ellis, “Cover song detection: From high scores to general classification,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Dallas, Texas, U.S.A., pp. 65-68, March 14-19, 2010.
[12] J. Serra, “Music similarity based on sequences of descriptors: tonal features applied to audio cover song identification,” M.S. thesis, MTG, Universitat Pompeu Fabra, Barcelona, Spain, 2007.
[13] 謝佳斌,AAC壓縮域翻唱歌曲辨識系統。中央大學通訊工程學系碩士學位論文,2012。
[14] 莊詠婷,利用AAC壓縮域特徵之古典樂翻奏曲檢索系統。中央大學通訊工程學系碩士學位論文,2013。
[15] E. Keogh, C. A. Ratanamahatana, “Exact indexing of dynamic time warping,” Knowledge and Information Systems, 2004.
[16] Yue Liu, and Hui Liu, and Bofeng Zhang and Gengfeng Wu, “Extraction of if-then rules from trained neural network and its application to earthquake prediction,” Cognitive Informatics, 2004. Proceedings of the Third IEEE International Conference.
[17] T. Kondo, J. Ueno, and S. Takao, “Medical image diagnosis of lung cancer by revised GMDH-type neural network self-selecting optimum neuron architectures,” System Integration (SII), IEEE/SICE International Symposium, 2011.
[18] N. L. D. Khoa, K. Sakakibara, and I. Nishikawa, “Stock Price Forecasting using Back Propagation Neural Networks with Time and Profit Based Adjusted Weight Factors,” SICE-ICASE, International Joint Conference, 2006.
[19] G. E. Hinton, and R. R , Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006.
[20] http://www.ling.fju.edu.tw/hearing/brain-into.htm
[21] D. E. Rumelhart, G. E. Hinton, R. J. Williams, “Learning representations by back-propagating errors,” Nature 323 (6088): 533–536, 8 October 1986.
[22] http://www.nature.com/news/computer-science-the-learning-machines-1.14481#/b1
[23] D. H. Ackley, G. E. Hinton, T. J. Sejnowski, “A Learning Algorithm for Boltzmann Machines,” In D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations (Cambridge: MIT Press): 282–317. 1985.
[24] P. Smolensky, Parallel Distributed Processing: Volume 1:Foundations, D. E. Rumelhart, J. L. McClelland, Eds. (MIT Press, Cambridge, 1986), pp. 194–281
[25] A. Mnih, and G. E. Hinton, “Learning Unreliable Constraints using Contrastive Divergence,” In IJCNN 2005, Montreal.
[26] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, “Greedy Layer-Wise Training of Deep Networks,” Advances in Neural Information Processing Systems 19, 2007.
[27] G. Casella, E. I. George, “Explaining the Gibbs Sampler,” The American Statistician 46 (3): 167, 1992.
[28] V. Nair, and G. E. Hinton, “3-D Object recognition with deep belief nets,” Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. lafferty, C. K. I. Williams, and A. Culotta (Eds.), pp 1339-1347.
[29] A. R. Mohamed, G. E. Dahl, and G. E. Hinton, “Deep belief networks for phone recognition,” NIPS 22 workshop on Deep Learning for Speech Recognition.
[30] G. E. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, Navdeep Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep Neural Networks for Acoustic Modeling in Speech Recognition,” IEEE Signal Processing Magazine, November, 2012.
[31] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[32] I. Mrazova, M. Kukacka, “Hybrid convolutional neural networks,” Industrial Informatics INDIN 2008. 6th IEEE International Conference, 2008.
[33] C. Neubauer, “Evaluation of convolutional neural networks for visual recognition,” IEEE Transactions on Neural Networks, VOL. 9, NO. 4, July 1998
[34] Andrew Ng, “Sparse Autoencoder,” Lecture notes. Deep Learning and Unsupervised Feature Learning, Winter, 2011
[35] Matlab Central, Deep Learning Toolbox,
http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox
[36] The Covers80 cover song data set,
http://labrosa.ee.columbia.edu/projects/coversongs/covers80/