| 研究生: |
張戴明 Tai-Ming Chang |
|---|---|
| 論文名稱: |
音檔壓縮資訊之和弦特徵轉換效能分析 Chord Transformation and Performance Analysis for Compressed Audio |
| 指導教授: |
張寶基
Pao-Chi Chang |
| 口試委員: | |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
資訊電機學院 - 通訊工程學系 Department of Communication Engineering |
| 論文出版年: | 2014 |
| 畢業學年度: | 102 |
| 語文別: | 英文 |
| 論文頁數: | 50 |
| 中文關鍵詞: | 高級音訊編碼 、壓縮域 、離散餘弦轉換 、Chroma特徵 、音樂檢索系統 |
| 外文關鍵詞: | AAC, transform domain, chroma feature, MDCT, music information retrieval |
| 相關次數: | 點閱:23 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著音樂專輯製作爆炸性的成長,如何管理鉅量的音樂資料以及快速檢索音樂資訊成為一項重要議題。對於鉅量的音樂資料庫,直接從音樂壓縮檔中直接擷取重要的頻率參數來表示音樂特徵,此方式大大的有益於提升音樂檢索速度。此論文中,我們針對高級音訊編碼(AAC)音檔進行分析離散傅立葉轉換(FFT)與離散餘弦轉換(MDCT)之間的頻率轉換差異,並考量轉換後的頻率解析度來選取適當的頻率範圍,進而提出一套在AAC壓縮域中Chroma特徵轉換方法。直接使用AAC壓縮資訊進行Chroma轉換時,其長短窗框轉換機制會致使不同窗框有著不同的頻率解析度,忽略此窗框切換進行Chroma特徵轉換會嚴重的影響其特徵對映的準確性,因此,如何在對有長短窗框切換機制的AAC檔進行Chroma特徵轉換是為一項挑戰。 對於有著較差頻率解析的短窗框,我們提出Peak competition方法合併8個接續的短窗框來增強音調的資訊。而在訊框切割方面,我們提出一簡單動態切割的方法取代複雜度高的節拍追蹤(Beat tracking)。再者,為了能夠處理不同取樣率的AAC音檔,我們提出動態頻率選擇機制來自動選擇不同取樣率以及不同窗框下的頻率範圍。實驗結果顯示,在Covers80資料庫中,我們提出的方法在Top-1音樂搜尋結果比先前壓縮域研究的文獻提升10%準確率,其音樂搜尋效能與現今在原始域的搜尋技術相去不遠,此外,我們所提出的動態頻率選擇方法對於不同取樣率下的AAC檔,其音樂檢索能力呈現穩定且具有相當的準確性。
With the explosive growth in the number of music albums produced, retrieving music information has become a critical aspect of managing music data. Extracting frequency parameters directly from the compressed files to represent music greatly benefits processing speed when working on a large database. In this study, we focused on advanced audio coding (AAC) files and analyzed the disparity in frequency expression between discrete Fourier transform and discrete cosine transform, considered the frequency resolution to select the appropriate frequency range, and developed a direct chroma feature-transformation method in the AAC transform domain. An added challenge to using AAC files directly is long/short window switching, ignoring which may result in inaccurate frequency mapping and inefficient information retrieval. For a short window in particular, we propose a peak-competition method to enhance the pitch information that does not include ambiguous frequency components when combining eight subframes. Moreover, for chroma feature segmentation, we propose a simple dynamic-segmentation method to replace the complex computation of beat tracking. In addition, a dynamic frequency selection method is proposed to deal with various sampling rate of AAC files. Our experimental results show that the proposed method increased the accuracy rate by approximately 10% in Top-1 search results over transform-domain methods described previously and performed nearly as effectively as state-of-the-art waveform-domain approaches did in Covers80 dataset. Furthermore, the proposed dynamic frequency method shows a stable performance for a comprehensive AAC retrieval system.
[1] ISO/IEC 11172-3 (F) (1999) Information technology - Coding of moving picture and associated audio for digital storage media at up to about 1.5Mbits/s Part3: Audio.
[2] ISO/IEC 13818-7 (1997) Information technology - Generic coding of moving pictures and associated audio, Part7: Advance Audio Coding.
[3] R. B. Dannenberg, W. P. Birmingham, B. Pardo, N. Hu, C. Meek, and G. Tzanetakis, “A comparative evaluation of search techniques for query-by-humming using the musart testbed,” Journal of the American Society for Information Science and Technology, vol. 58, no. 5, pp. 687-701, 2007.
[4] J. Serrà, E. Gómez, and P. Herrera, Audio cover song identification and similarity: background, approaches, evaluation and beyond, in Advances in Music Information Retrieval, Germany Springer, 2010.
[5] T. Fujishima, “Realtime chord recognition of musical sound: A system using common lisp music,” in Proc. Int. Comput. Music Conf., pp. 464-467, 1999.
[6] M. Müller and S. Ewert, “Towards timbre-invariant audio features for harmony-based music,” IEEE Transactions on Audio Speech and Signal Processing, vol. 18, no. 3, pp. 649-662, 2010.
[7] J. P. Bello and J. Pickens, “A robust mid-level representation for harmonic content in music signals,” in Proc. Int. Conf. Music Inf. Retrieval, pp. 304-311, 2005.
[8] D. Gusfield, Algorithms on strings, trees and sequences: computer sciences and computational biology, Cambridge University Press, 1997.
[9] L. R. Rabiner and B. H. Juang. Fundamental of speech recognition, Prentice, Englewood Cliffs, NJ, 1993.
[10] V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” Soviet Physics-Doklady, vol. 10, no. 8, pp. 707-710, 1966.
[11] S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequences of two proteins,” Journal of Molecular Biology, vol. 48, no. 3, pp. 443-453, 1970.
[12] P. H. Sellers, “On the theory and computation of evolutionary distances,” SIAM Journal on Applied Mathematics, vol. 26, no. 4, pp. 787-793, 1974.
[13] T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” Journal of Molecular Biology, vol. 147, no. 1, pp. 195-197, 1981.
[14] D. P. W. Ellis and G. E. Polliner, “Identifying cover songs with chroma features and dynamic programming beat tracking,” MIREX extended abstract, 2006.
[15] D. P. W. Ellis & G. E. Polliner, “Identifying cover songs with chroma features and dynamic programming beat tracking,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. 1429-1432, April 2007.
[16] E. Gómez, Tonal description of music audio signals, Ph.D. dissertation, Music Technol. Group, Univ. Pompeu Fabra, Barcelona, Spain, 2006.
[17] E. Gómez and P. Herrera, “Estimating the tonality of polyphonic audio files: Cognitive versus machine learning modelling strategies,” in Proc. Int. Symp. Music. Inf. Retrieval (ISMIR), pp. 92-95, 2004,
[18] M. Riley, E. Heinen, and J. Ghosh, “A text retrieval approach to content-based audio retrieval,” In: Int. Symp. on Music Information Retrieval (ISMIR), pp. 295-300, Sep. 2008.
[19] C Todd, “A Digital Audio System for Broadcast and Prerecorded Media,” in Proc. 75th Conv. Aud. Eng. Soc., Mar. 1984.
[20] E. F. Schroder and W. Voessing, “High Quality Digital Audio Encoding With 3.0 Bits/Sample Using Adaptive Transform Coding,” in Proc. 80th Conv. Aud. Eng. Soc., Mar. 1986.
[21] G. Theile, M. Link, and G. Stoll, “Low-Bit Rate Coding of High Quality Audio Signals”, in Proc. 82nd Conv. Aud. Eng. Soc., Mar. 1987.
[22] K. Brandenburg, “OCF – A New Coding Algorithm for High Quality Sound Signals,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 12, pp. 141-144, Apr. 1987.
[23] J. Johnston, “Transform Coding of Audio Signals Using Perceptual Noise Criteria,” IEEE J. Sel. Areas in Comm., vol. 6, no. 2, pp. 314-23, Feb. 1988.
[24] W. Y. Chan and A. Gersho, “High Fidelity Audio Transform Coding With Vector Quantization,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 1109-1112, Apr. 1990.
[25] K. Brandenburg and J. D. Johnston, “Second Generation Perceptual Audio Coding: The Hybrid Coder,” in Proc. 88th Conv. Aud. Eng. Soc., Mar. 1990.
[26] K. Brandenburg, et al, “Aspec-Adaptive Spectral Entropy Coding of High Quality Music Signals,” in Proc. 90th Conv. Aud. Eng. Soc., Feb. 1991.
[27] Y. F. Dehery, M. Lever, and P. Urcum, “A MUSICAM Source Codec for Digital Audio Broadcasting and Storage,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 5, pp. 3605-3608, Apr. 1991.
[28] M. Iwadare, et al., “A 128 kb/s Hi-Fi Audio Codec Based on Adaptive Transform Coding with Adaptive Block Size MDCT”, IEEE J. Sel. Areas in Comm., vol. 10, no. 1, pp. 138-144, Jan. 1992.
[29] T. Painter and A. Spanias, "Perceptual coding of digital audio,” Proceedings of the IEEE, vol. 88, no. 4, pp. 451-513, Apr. 2000.
[30] Steve Vernon, “Design and implementation of AC-3 Coders,” IEEE Transactions on Consumer Electronics, vol. 41, no. 3, pp. 754-759, Aug. 1996.
[31] H. Sakamoto, Y. Shibuya, H. Takano, and O. Kitabatake, “A Dolby AC-3/MPEG1 Audio Decoder Core suitable for Audio/Visual System Integration,” IEEE Custom Integrated Circuits Conference, pp. 241-248, Nov. 1997.
[32] D. Pan, “A Tutorial on MPEG/Audio Compression,” IEEE Multimedia, vol. 2, no.2, pp. 60-71, 1995.
[33] E. Zwicker and H. Fastl, Psychoacoustics - Facts and Models, Springer Berlin, Heidelberg, 1990.
[34] J. D. Johnston and A. J. Ferreira, “Sum-difference stereo transform coding,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, pp. 569-572, San Francisco, USA, March 1992.
[35] K. H. Huang and J. F. Yang, Low Data Rate MPEG-1 Layer III Audio Coder Enhancement, Thesis for Master of Science, Department of Electrical Engineering National Cheng Kung University, 2002.
[36] N. V. Patel and I. K. Sethi, “Audio characterization for video indexing,” In Proc. SPIE, vol. 2670, pp. 373-384, 1996.
[37] Y. Nakajima, Y. Lu, M. Sugano, A. Yoneyama, H. Yamagihara, and A. Kurematsu, “A fast audio classification from MPEG coded data,” In proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), vol. 6, pp. 3005-3008, 1999.
[38] X. Shao, C. Xu, Y. Wang, and M. Kankanhalli,“Automatic music summarization in compressed domain,” In Proc. IEEE Int. Conf. Acoustics, Speech and Sig. Proc. (ICASSP), vol. 4, pp. 261-264, 2004.
[39] T. M. Chang, E. T. Chen, C. B. Hsieh, and P. C. Chang, “Cover song identification with direct chroma feature extraction from AAC files,” IEEE 2nd Global Conference on Consumer Electronics, pp. 55-56, 2013.
[40] E. Ravelli, G. Richard, and L. Daudet, “Audio signal representations for indexing in the transform domain,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 3, pp. 434-446, 2010.
[41] C. H. Yu and S. D. You, “On the possibility of only using long windows in MPEG-2 AAC coding,” IEEE Pacific Rim Conference on Multimedia, pp. 663-670, 2002.
[42] T. H. Tsai and C. Liu, “A configurable common filterbank processor for multi-standard audio decoder,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. 90 no.9, pp.1913-1923, 2007.
[43] J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Mathematics of Computation, vol. 19, pp.297-301, 1965.
[44] S. Chen, N. Xiong, J. Park, M. Chen, and R. Hu, “Spatial parameters for audio coding: MDCT domain analysis and synthesis,” Multimedia Tools Applications, vol. 48, no. 2, pp. 225-246, 2010.
[45] H. Malvar, Signal processing with lapped transforms. Artech House, Inc., 1992.
[46] J. Fan and Q. Yao, Nonlinear time series: nonparametric and parametric methods, Springer, 2005.
[47] G. Hinsen and D. Klösters, “The sampling series as a limiting case of Lagrange interpolation,” Applicable Analysis, vol. 49, no. 1-2, pp. 49-60, 1993.
[48] Programs for Digital Signal Processing, IEEE Press, 1979.
[49] G. Oetken, T. W. Parks, and H. W. Schussler, “New results in the design of digital interpolators,” IEEE Trans. Acoust. Speech, Signal Processing, vol. 23, no. 3, pp. 301-309, 1975.
[50] J. Serra, G. Emilia, and H. Perfecto, Advances in music information retrieval, Springer-Verlag, Berlin Heidelberg, 2010.
[51] J. Serra, E. Gomez, P. Herrera, and X. Serra, “Chroma binary similarity and local alignment applied to cover song identification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 6, pp. 1138-1151, 2008.
[52] S. Ravuri and D. P. W. Ellis, “The hydra system of unstructured cover song detection,” Ext. Abstract for the MIREX Audio Cover Song Identification task submission, Kobe, Japan, 2009.
[53] T. Bertin-Mahieux and D.P.W. Ellis, “Large-scale cover song recognition using hashed chroma landmarks,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 117-120, 2011.
[54] T. Bertin-Mahieux, D. P. W. Ellis, and B. Whitman, P. Lamere, “The million song dataset,” In Proceedings of the 12th International Society for Music Information Retrieval Conference, 2011.
[55] S. Chakrabarti , R. Khanna , U. Sawant , and C. Bhattacharyya, “Structured learning for non-smooth ranking losses,” Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 88-96, 2008.
[56] M. H. Lee, S. Rho, and E. I. Choi, “Ontology based user query interpretation for semantic multimedia contents retrieval,” Multimedia Tools and Applications, doi:10.1007/s11042-013-1383-2, 2013.