跳到主要內容

簡易檢索 / 詳目顯示

研究生: 莊詠婷
Yung-ting Chuang
論文名稱: 利用AAC壓縮域特徵之古典樂翻奏曲檢索系統
Classical Music Cover Song Retrieval System utilizing AAC Domain Features
指導教授: 張寶基
Pao-chi Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 通訊工程學系
Department of Communication Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 68
中文關鍵詞: 古典樂AAC壓縮域音樂檢索
外文關鍵詞: classical music cover song, AAC, compression domain, content-based music retrieval
相關次數: 點閱:14下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於網際網路及多媒體壓縮技術已相當成熟,人們對網路的需求也日益劇增,透過網路下載或分享影音資料已成為人們生活中的一部分,而龐大的音樂資料庫是很常見的,因此如何在資料庫中快速檢索出使用者所需的資料是個重要的課題。常見的搜尋引擎大多藉由文字作為輸入,但卻有標記錯誤或模糊造成檢索結果錯誤的缺點,此情況於檢索古典樂時比流行樂更常發生。
    本論文針對古典音樂資料庫,利用AAC壓縮域的特徵,部分解碼出改良式離散餘弦係數,可節省約70%的解碼運算複雜度,且對係數能量作前置處理以提升準確率,將係數重新定義於十二平均律音名,並利用內積計算求得樂曲相似度矩陣,藉由尋找最佳相似度累計路徑求得其相似度分數權重平均值,以得到最後檢索結果。實驗結果顯示,所提出之方法其檢索效能MRR值為0.96,可達97%的準確率,且與傳統基於原始域檢索的方法比較,可省下90%以上的比對時間。


    With the rapid development of Internet and multimedia compression techniques, people can easily download or share multimedia data through networks. Therefore, efficient multimedia retrieval from huge multimedia database becomes an important issue. The most common method of search engines is through textual label. However, the label created by people may be ambiguous or even with errors. The situation like this in retrieving classical music occurs more often than pop music.
    In our proposed system, we focus on classical music cover song retrieval in AAC compression domain. The modified discrete cosine transform coefficients are directly used to represent 12-dimensional chroma feature without a fully decoding process, and it can save about 70% decoding complexity. We truncate MDCT coefficients with low magnitude, adjust frequency boundary dynamically, and utilize dot-product calculation to get chroma similarity matrix. We calculate the similarity weighted arithmetic mean value between the songs by finding optimal similarity accumulated path and finally get the ranking results.
    The experimental results show that the proposed method can reach Precision of 97% and save over 90% matching time compared with traditional approach in the waveform domain.

    摘 要 i Abstract ii 誌謝 iii 目 錄 iv 圖目錄 vi 表目錄 viii 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 2 1.3 論文架構 4 第二章 音樂檢索與音訊壓縮技術簡介 5 2.1 音樂檢索之簡介 5 2.1.1 音樂內涵式檢索特徵概述及進展 6 2.1.2 翻唱歌曲定義及變異性 10 2.1.3 古典樂發展及特性 11 2.2 內涵式音樂檢索相關文獻 13 2.2.1 原始域之音樂檢索相關文獻介紹 13 2.2.2 壓縮域之音樂檢索相關文獻介紹 14 2.3 音訊壓縮技術簡介 16 第三章 提出之壓縮域古典樂翻奏曲檢索方法 21 3.1 部分解碼 22 3.2 前置處理 22 3.2.1 MDCT係數能量截斷 23 3.2.2 動態頻率範圍截斷 25 3.3 音訊特徵擷取 27 3.3.1 特徵擷取 27 3.3.2 音段切割及正規化 29 3.4 相似度比對 30 3.4.1 相似度內積計算 31 3.4.2 動態時間扭曲累計 33 3.5 後置處理 35 第四章 實驗結果與分析討論 40 4.1 實驗環境與運算複雜度評估 40 4.2 系統效能評估方式 43 4.3 提出之檢索系統效能分析 45 4.3.1 系統參數實驗分析 45 4.3.2 整體系統效能分析 48 第五章 結論及未來展望 52 參考文獻 54

    [1]The Official YouTube Blog. http://youtube-global.blogspot.jp/2013/03/onebillionstrong.html
    [2]Music Information Retrieval Evaluation eXchange.
    http://www.music-ir.org/mirex/wiki/MIREX_HOME
    [3]J. Serra, E. Gomez, and P. Herrera, “Audio cover song identification and similarity: background, approaches, evaluation, and beyond,” Advances in Music Information Retrieval, vol. 274, ch. 14, pp. 307-332, March 2010.
    [4]吉松隆文,吳怡文譯,古典音樂簡單到不行,初版,大雁文化,民國96年。
    [5]吉松隆作,呂雅昕,游蕾蕾譯,古典音樂一下就聽懂名曲Guide,初版,大雁文化,民國96年。
    [6]D. P. W. Ellis and G. E. Poliner, “Identifying ‘Cover Songs’ with Chroma Features and Dynamic Programming Beat Tracking,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, U.S.A., pp. 1429-1432, April 15-20, 2007.
    [7]J. Serra and E. Gomez, “Audio cover song identification based on tonal sequence alignment,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Las Vegas, Nevada, U.S.A., pp.61-64, March 30- April 4, 2008.
    [8]S. Kim, E. Unal, and S. Narayanan, “Music fingerprint extraction for classical music cover song identification,” in Proc. Int. Conf. on Multimedia and Expo, Hannover, pp.1261-1264, June 23- April 26, 2008.
    [9]X. Chuan, “Cover song identification using an enhanced chroma over a binary classifier based similarity measurement framework,” in Proc. Int. Conf. on Systems and Informatics (ICSAI), Las Vegas, Nevada, U.S.A., pp.2170-2176, May 19- 20, 2012.
    [10]T. Bertin-Mahieux and D. P. W. Ellis, “Large-scale cover song recognition using hashed chroma landmark,” in Proc. IEEE Int. Conf. on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, pp.117-120, Oct. 19-20, 2011.
    [11]Z. C. Cheng, C. S. Lin, and Y. H. Chen, “Fast Music Information Retrieval Using PAT Tree Based Dynamic Time Warping,” in Proc. Int. Conf. on Communications and Signal Processing, Singapore, Dec. 2011, pp. 1 – 5.
    [12]T. Bertin-Mahieus and D. Ellis, “Large-Scale Cover Song Recognition using the 2D Fourier Transform Magnitude,” in Proc. Int. Conf. on International Society for Music Information Retrieval Conference (ISMIR), Porto, Oct. 8-12 2012.
    [13]E. Ravelli, G. Richard, and L. Daudet, “Audio Signal Representations for Indexing in the Transform Domain,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 3, pp. 434-446, March 2010.
    [14]T. H. Tsai and Y. T. Wang, “Content-Based Retrieval of Audio Example on MP3 Compression Domain,” in Proc. IEEE 6th Workshop on Multimedia Signal Processing, pp.123-126, September 2004.
    [15]T. H. Tsai and W. C. Chang, “Two-Stage Method for Specific Audio Retrieval based on MP3 Compression Domain,” in Proc. IEEE International Symposium on Circuits and Systems, pp. 713-716, May 2009.
    [16]C. C. Liu and C. S. Huang, “A singer identification technique for content-based classification of MP3 music objects,” in Proc. Int. Conf. on Information and Knowledge Management, McLean, Virginia, U.S.A., pp. 438-445, November 4-9, 2002.
    [17]C. C. Liu and P. J. Tsai, “Content-based retrieval of Mp3 music objects,” in Proc. Int. Conf. on Information and knowledge management, New York, U.S.A. , pp. 506-511, 2011.
    [18]Y. Jiao, B. Yang, M. Li, and X. M. Niu, “MDCT-Based Perceptual Hashing for Compressed Audio Content Identification,” in Proc. IEEE Int. Conf. on Multimedia Signal Processing, Crete , pp. 381-384, Oct. 1-3, 2011.
    [19]ISO/IEC JTCI SC29/WG11, “ISO/IEC FDIS 14496-3 Subparts 1 ,2 ,3, Coding of Audio-Visual Objects – Part 3: Audio,” October 1988.
    [20]M. Muller, D. P. W. Ellis, A. Klapuri, and G. Richard, “Signal Processing for Music Analysis,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no.6, pp.1088-1110, October 2011.
    [21]T. H. Tsai, and C. Liu, “A Configurable Common Filterbank Processpr for Multi-Standard Audio Decoder,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. 90,no. 9, pp. 1913-1923, September 2007.

    QR CODE
    :::