| 研究生: |
郭家維 Chia-Wei Kuo |
|---|---|
| 論文名稱: |
基於總諧訊比的分離音源技術 Music Source Separation Based on Total Harmonic to Signal Ratio |
| 指導教授: | 張寶基 |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 通訊工程學系在職專班 Executive Master of Communication Engineering |
| 論文出版年: | 2015 |
| 畢業學年度: | 103 |
| 語文別: | 中文 |
| 論文頁數: | 66 |
| 中文關鍵詞: | 音源分離、諧波結構 |
| 相關次數: | 點閱:11 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
混合音源分離是一個相當有挑戰性的課題,在實務上,樂器的模擬、音樂的錄製、以及聲紋的辨識…等,都是很重要的應用,過去有AHS (Average Harmonic Structure)的研究來處理混合音源,但重複抽取的過程較為複雜,為了簡化計算複雜度,我們提出了一個簡單而且實用的方法;首先將訊號作傅立葉轉換,利用每個樂器在頻譜上,都有獨特的諧波結構,這些諧波與基頻能量比是固定的,於是我們提出了新的參數THSR (Total Harmonic to Signal Ratio),將原本特有的諧波結構純量化,減少了重複抽取、估計以及運算的流程;但由於諧波結構誤差、以及解析度大小,使得THSR有所偏差,所以我們將THSR模型化,再透過貝氏分類器(Bayes classifier),將分離出來的音源作分類。這樣的方式不儘改善了複雜度,也可以將誤判的機率最小化。
我們用MATLAB做了模擬實驗,從實驗結果得到理想的聽覺效果。由COOL EDIT觀察處理後的音訊變化,及使用EAQUAL客觀評量分離後音訊品質效果。本論文提供一個新的參數,讓分離音源品質能夠兼顧,並且也提供了能夠簡化系統流程的架構。
The separation of mixed sources is a challenging task. Music separation can be applied to simulation instruments, phoneme post-production, and music retrieval, and so on. Although the method of average harmonic structure (AHS) has been applied to separate the mixed music, its method is complicated and takes time. In this thesis, the mixed signals were first taken by fast Fourier transform for frequency analysis. Since each instrument has its own specific harmonic structure, that is, the ratio of the power between the harmonic signals and the fundamental signal is constant, a new factor, total harmonic to signal ratio (THSR), is proposed to separate the mixed monophonic source. Because the factor of THSR is a scalar, the procedure of mixed sources separation can thus be simplified. However, due to the harmonic structure instability and the resolution of frequency analysis, the factor THSR may be biased as being produced. A Bayes classifier was further applied to reduce the probability of error-classification.
MATLABwas applied for simulation. The separated music can hear clearly. Furthermore, from the Cool Edittool and EAQUAL, we can obtain similar results of classification, compared with the results by the method of AHS. In other words, this thesis provides a simple but effective method to separate the mixed sources.
[1] A. W.Bronkhorst, “The Cocktail Party Phenomenon: A Review on Speech Intelligibility in Multiple-Talker Conditions,” ActaAcustica united with Acustica, vol.86, pp. 117–128, 2000.
[2] A. Hyvärinen, and E. Oja, “Independent component analysis: Algorithms and applications,” Neural Networks, vol. 13, pp. 411–430, 2000.
[3] G. J. Brown, and M. P. Cooke, “Computational auditory scene analysis,” Comput. Speech Lang., vol. 8, pp. 297–336, 1994.
[4] J. Eargle, Handbook of Recording Engineering 4th Edition, Kluwer Academic Publishers, 2003.
[5] E. Zwicker, and H. Fastl, Psychoacoustics Facts and Models 2nd Updated Edition, Springer, 1999.
[6] D. D. Lee, and H. S. Seung, “Learning the parts of objects by nonnegative matrix factorization,” Nature, vol. 401, pp. 788–791, 1999.
[7] S. T. Roweis, “One microphone source separation,” in Proc. NIPS, pp. 15–19, 2001.
[8] J. Hersheya, and M. Casey, “Audio-visual sound separation via hidden Markov models,” in Proc. NIPS, pp. 1173–1180, 2002.
[9] E. Vincent, and M. D. Plumbley, “Single-channel mixture decomposition using Bayesian harmonic models,” in Proc. ICA, pp.722–730, 2006.
[10] M. Bay, and J. W. Beauchamp, “Harmonic source separation using prestored spectra,” inProc. ICA, pp. 561–568, 2006.
[11] Z. Duan, Y. Zhang, C. Zhang, and Z. Shi, “Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling,”IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 4, May 2008.
[12] B.Raczynski, and B. Curtis, “Software data violate SPC's underlying assumptions,” IEEE Software, vol. 25, No. 3, pp. 49-51, June 2008.
[13] C.Cattani, and J. Rushchitsky, “Wavelet and wave analysis as applied to materials with micro or nanostructure,” Series on Advances in Mathematics for Applied Sciences, vol. 47, Sep. 2007.
[14] H. Viste, and G. Evangelista, “A method for separation of overlapping partials based on similarity of temporal envelopes in multichannel mixtures,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 3, pp. 1051–1061, May 2006.
[15] Ganesh R. Naik, and Dinesh K Kumar, “An Overview of Independent Component Analysis and Its Applications,” Informatica 35, p.63–81, 2011.
[16] S. Anbumalr, P. Rameshbabu, and R. Anandanatarajan, “Chromatograms separation using matrix decomposition” International Journal of Computer Applications (0975 – 8887), Volume 27– No.3, Aug. 2011.
[17] P. Smaragdis, and Judith C. Brown, “Non-negative matrix factorization for polyphonic music transcription”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 19-22,Oct. 2003.
[18] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”, Proceedings of the IEEE, pp. 257–286, February 1989.
[19] J. Li, A. Najmi, and R. M. Gray, “Image classification by a two dimensional hidden Markov model”, IEEE Transactions on Signal Processing, vol.48, no.2, pp. 517-33, February 2000.
[20] F. Scholkmann, J. Boss and M. Wolf, “An efficient algorithm for automatic peak detection in noisy periodic and quasi-periodic signals”, Algorithms, vol.5, pp. 588-603, 2012.
[21] J. Barros, “On the use of the Hanning window for harmonic analysis in the standard framework”, IEEE Power & Energy Society, pp.538-539, Jan. 2006.
[22] D. Shmilovitz, “On the definition of total harmonic distortion and its effect on measurement interpretation”, IEEE Transactions on power delivery, vol. 20, no. 1, January 2005.
[23] M. Davy, S. Godsill, and J. Idier, “Bayesian analysis of polyphonic western tonal music,” J. Acoust. Soc. Amer., vol. 119, no. 4, pp. 2498–2517, 2006.
[24] H. Thornburg, R. J. Leistikow, and J. Berger, “Melody extraction and musical onset detection via probabilistic models of framewise STFT peak data,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 4, pp. 1257–1272, May 2007.
[25] Y. Zhang and C. Zhang, “Separation of music signals by harmonic structure modeling,” in Proc. NIPS, pp. 1617–1624, 2006.
[26] M. K. I. Molla, and K. Hirose, “Single-mixture audio source separation by subspace decomposistion of Hilbert spectrum,” IEEE Trans. Audio, Speech, Lang., Process., vol. 15, no. 3, pp. 893–900, Mar. 2007.
[27] V. Välimäki, J. Pakarinen, C. Erkut, and M. Karjalainen, “Discrete time modelling of musical instruments,” Rep. Progress in Phys., vol. 69, pp. 1–78, 2006.