| 研究生: |
謝佳斌 Chia-Bin Hsieh |
|---|---|
| 論文名稱: |
AAC壓縮域翻唱歌曲辨識系統 Cover Song Identification in AAC Compression Domain |
| 指導教授: |
張寶基
Pao-Chi Chang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 通訊工程學系 Department of Communication Engineering |
| 畢業學年度: | 100 |
| 語文別: | 中文 |
| 論文頁數: | 48 |
| 中文關鍵詞: | 翻唱歌曲 、AAC 、壓縮域 、音樂檢索 |
| 外文關鍵詞: | cover song, AAC, compression, music information retrieval |
| 相關次數: | 點閱:17 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著多媒體壓縮技術以及網際網路的蓬勃發展,使用者可透過網際網路下載或分享各種多媒體影音資料,然而,其影音內容卻可能是受版權保護而使用者在不知情的狀況下因此觸法。在本論文中對於AAC 音訊壓縮格式提出翻唱歌曲辨識系統,其目標為快速檢索到資料庫內的原唱版本。從商業的角度來看,對於音樂的版權及管理是相當重要的,另一方面,對於使用者可以找到不同版本的歌曲既是有趣又實用。
在我們所提出的壓縮域翻唱歌曲辨識系統中,直接透過部分解碼得到音訊串流中的改良式餘弦轉換係數,將其重新定義到西方樂理上的十二平均率,在特徵擷取的步驟可以降低完全解碼的過程,並且利用音段切割降低特徵的時間維度以提升比對效率。因此,我們整體的系統可以省下很多運算複雜度,而且目前系統辨識準確率在Top-1 已達到60%。在實際應用層面,對於檢索大量已編碼過的多媒體影音資料,我們的系統提供一個快速且準確地搜尋方法。
With the rapid development of multimedia compression technology and Internet in recent years, users can easily download or share any kind of videos or music from networks. However, the downloaded contents may be copyrighted. In this work, we propose a system which can automatically identify cover version songs in AAC compression domain. Our goal is to fast retrieve the original version songs in a large coded database. From the commercial perspective, it is important to detect cover songs for musical copyrights’ management and licenses. Besides, it is interesting and useful to find out all versions for a particular song.
In our proposed system, the modified discrete cosine transform (MDCT) spectral coefficients are directly used to represent 12-dimensional chroma feature without a fully decoding process. In addition, we utilize segmentation to reduce time dimension in feature space for promoting the matching efficiency. Our overall system can save a lot of computation complexity and reach approximately 60% accuracy in Top-1. In practical applications, our system provides a good solution for retrieval system with large amount of coded multimedia files.
[1] M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-Based Music Information Retrieval:Current Directions and Feature Challenges,” in Proc. of the IEEE, vol. 96 no. 4, pp. 668-696, April 2008.
[2] 侯志欽,聲學原理與多媒體音訊科技,初版,台灣商務印書館,台北市,民國九十六年。
[3] 陳仁寬,樂理入門與指導,初版,五洲出版有限公司,台北市,民國八十五年。
[4] Music Information Retrieval Evaluation eXchange. http://www.music-ir.org/mirex/wiki/2006:Main_Page
[5] J. Serra, E. Gomez, and P. Herrera, “Audio cover song identification and similarity: background, approaches, evaluation, and beyond,” Advances in Music Information Retrieval, vol. 274, ch. 14, pp. 307-332, March 2010.
[6] D. P. W. Ellis, and G.E. Poliner, “Identifying ‘Cover Songs’with Chroma Features and Dynamic Programming Beat Tracking,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, U.S.A., pp. 1429-1432, April 15-20, 2007.
[7] J. Serra, and E. Gomez, “Audio cover song identification based on tonal sequence alignment,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Las Vegas, Nevada, U.S.A., pp.61-64, March 30- April 4, 2008.
[8] S. Ravuri and D. P. W. Ellis, “Cover song detection: From high scores to general classification,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Dallas, Texas, U.S.A., pp. 65-68, March 14-19, 2010.
[9] E. Ravelli, G. Richard, and L. Daudet, “Audio Signal Representations for Indexing in the Transform Domain,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 3, pp. 434-446, March. 2010.
[10] H. Wang, A. Divakaran, A. Vetro, S. Chang, and H. Sun,“Survey of Compressed-Domain Features Used in Audio-Visual Indexing and Analysis,” Journal of Visual Communication and Image Representation, vol. 14, no. 2, pp. 150-183, June 2003.
[11] T. H. Tsai and Y. T. Wang, “Content-Based Retrieval of Audio Example on MP3 Compression Domain,” in Proc. IEEE 6th Workshop on Multimedia Signal Processing, pp.123-126, September 2004.
[12] T. H. Tsai and W. C. Chang, “Two-Stage Method for Specific Audio Retrieval based on MP3 Compression Domain,” in Proc. IEEE International Symposium on Circuits and Systems, pp. 713-716, May 2009.
[13] C. C. Liu and C. S. Huang, “A singer identification technique for content-based classification of MP3 music objects,” in Proc. Int. Conf. on Information and Knowledge Management, McLean, Virginia, U.S.A., pp. 438-445, November 4-9, 2002.
[14] D. Pan, “A Tutorial on MPEG/Audio Compression,” IEEE Multimedia Magazine, summer 1995, pp. 60-74.
[15] International Organization for Standardization, “Information Technology - Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s - Part 3:Audio,” ISO/IEC 11172-3, March 1999.
[16] International Organization for Standardization, “Information Technology - Generic coding of moving pictures and associated audio information - Part 7:Advanced Audio Coding (AAC), ”ISO/IEC 13818-7, 1997.
[17] International Organization for Standardization, “Information Technology - Coding of audio-visual objects - Part 3: Audio,”ISO/IEC DIS 14496-3, 1998.
[18] M. Muller, D. P. W. Ellis, A. Klapuri, and G. Richard, “Signal Processing for Music Analysis,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no.6, pp.1088-1110, October 2011.
[19] The musical instrument dynamic ranges and names:http://en.wikipedia.org/wiki/Range_(music)#cite_note-M29-0
[20] Instrument frequency dynamic ranges poster:http://www.independentrecording.net/irn/resources/freqchart/main_display.htm
[21] J. Serra, E. Gomez, P. Herrera, and X. Serra, “Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 6, pp. 1138-1151, August 2008.
[22] The Cover 80 cover song data set : http://labrosa.ee.columbia.edu/projects/coversongs/covers80/
[23] T. H. Tsai and C. Liu, “A Configurable Common Filterbank Processpr for Multi-Standard Audio Decoder,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. 90, no.9, pp. 1913-1923, September 2007.
[24] T. Bertin-Mahieux and D. P. W. Ellis, “Large-scale cover song recongnition using hashed chroma landmarks,” in Proc. IEEE Workshop on Application of Signal Processing to Audio and Acoustics, New Paltz, NY, U.S.A., pp.117-120, October 16-19, 2011.