跳到主要內容

簡易檢索 / 詳目顯示

研究生: 郭家維
Chia-Wei Kuo
論文名稱: 基於總諧訊比的分離音源技術
Music Source Separation Based on Total Harmonic to Signal Ratio
指導教授: 張寶基
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 通訊工程學系在職專班
Executive Master of Communication Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 66
中文關鍵詞: 音源分離、諧波結構
相關次數: 點閱:11下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 混合音源分離是一個相當有挑戰性的課題,在實務上,樂器的模擬、音樂的錄製、以及聲紋的辨識…等,都是很重要的應用,過去有AHS (Average Harmonic Structure)的研究來處理混合音源,但重複抽取的過程較為複雜,為了簡化計算複雜度,我們提出了一個簡單而且實用的方法;首先將訊號作傅立葉轉換,利用每個樂器在頻譜上,都有獨特的諧波結構,這些諧波與基頻能量比是固定的,於是我們提出了新的參數THSR (Total Harmonic to Signal Ratio),將原本特有的諧波結構純量化,減少了重複抽取、估計以及運算的流程;但由於諧波結構誤差、以及解析度大小,使得THSR有所偏差,所以我們將THSR模型化,再透過貝氏分類器(Bayes classifier),將分離出來的音源作分類。這樣的方式不儘改善了複雜度,也可以將誤判的機率最小化。
    我們用MATLAB做了模擬實驗,從實驗結果得到理想的聽覺效果。由COOL EDIT觀察處理後的音訊變化,及使用EAQUAL客觀評量分離後音訊品質效果。本論文提供一個新的參數,讓分離音源品質能夠兼顧,並且也提供了能夠簡化系統流程的架構。


    The separation of mixed sources is a challenging task. Music separation can be applied to simulation instruments, phoneme post-production, and music retrieval, and so on. Although the method of average harmonic structure (AHS) has been applied to separate the mixed music, its method is complicated and takes time. In this thesis, the mixed signals were first taken by fast Fourier transform for frequency analysis. Since each instrument has its own specific harmonic structure, that is, the ratio of the power between the harmonic signals and the fundamental signal is constant, a new factor, total harmonic to signal ratio (THSR), is proposed to separate the mixed monophonic source. Because the factor of THSR is a scalar, the procedure of mixed sources separation can thus be simplified. However, due to the harmonic structure instability and the resolution of frequency analysis, the factor THSR may be biased as being produced. A Bayes classifier was further applied to reduce the probability of error-classification.
    MATLABwas applied for simulation. The separated music can hear clearly. Furthermore, from the Cool Edittool and EAQUAL, we can obtain similar results of classification, compared with the results by the method of AHS. In other words, this thesis provides a simple but effective method to separate the mixed sources.

    目錄摘要………………………………………………………….Ⅰ Abstract ……………………………………………………..Ⅱ 誌謝…………………………………………….....…………IV 目錄…………………………………………......……………V 附圖索引………………….…………………………………Ⅷ 附表索引………………………………………….…………Ⅸ 第一章緒論 1.1研究背景………………………..……………………1 1.2研究動機與目的…...…………….……..……………2 1.3論文架構………………………………………..……3 第二章音源分離技術簡介2.1音源簡介及其特徵………………………………..…4 2.1.1音量…………………………….…..……………...…5 2.1.2音高……..…………………………..………………6 2.1.3音品…….…………………………..………………7 2.2音源分離的基本理論……………..…..….….……..…8 2.2.1計算聽覺場景分析理論…....………...…………..…9 VI 2.2.2 光譜分解理論…………………………………...…...10 2.2.2.1 獨立成分分析法……..………………....………10 2.2.2.2非負矩陣分解法……...……….………...…….…11 2.2.3模型基礎理論……….……………………………13 2.2.3.1隱藏式馬可爾夫模型………..…………………13 2.2.3.2 貝氏模型………………………………..………14 2.3音源分離技術簡介及文獻研究..……………………15 2.3.1諧波結構的基本定義…….………………………16 2.3.2 分離流程………………………………….………18 2.3.2.1 平均諧波結構(AHS)模型學習……………..…..18 2.3.2.2 基於AHS模型分離音源……..…………..……..20 第三章提出之音源分離架構與方法3.1提出之音源分離架構…….………………...………22 3.2總諧訊比分佈模型…………………….…………23 3.2.1 快速傅立葉轉換………..……………….………24 3.2.2峰值偵測……………………………………….…27 3.2.3諧波結構抽取……………………………………29 3.2.4總諧訊比的運算….…………………...…………29 VII 3.2.5 總諧訊比分佈模型化……….……………..……30 3.3 音源分離……………………………..……………33 3.3.1 貝氏分類器….………………………………..……33 3.3.2快速傅立葉逆轉換……….…………………..……35 3.3.2音源訊號的重建………………….……………..…36 第四章實驗結果與數據分析4.1實驗內容……………………………………………38 4.1.1時頻分析………………………………..…...………38 4.2實驗結果……..………………………………….……40 4.2.1短笛&風琴….………………………………...……40 4.2.2短笛&雙簧管…………………………….……42 4.3 客觀評分測試……………………..…………………45 4.4後續實驗…………………………..…………………48 4.4.1薩克斯風&小提琴.………………………...……48 4.4.2 低音管&人聲…….………………………...……49 第五章結論與未來展望5.1結論………………………………..…………………51 5.2 未來展望……………………………………..………52 參考文獻……………………………………..………53 VIII 附圖索引圖2-1日常生活中聲音強度與頻率的分佈[4]......................................6 圖2-2以CASA為基礎的音源分離流程圖…......................................10 圖2-3盲訊號分離(BSS)方塊圖[15]…….............................................11 圖2-4非負矩陣分解(NMF)示意圖……..……...……………….........12 圖2-5隱藏式馬可爾夫模型狀態變遷圖..............................................14 圖2-6貝氏模型結構示意圖...................................................................15 圖2-7不同的樂器和人聲的AHS 模型[11]..........................................20 圖3-1提出之混合音源分離流程圖.......................................................23 圖3-2漢寧窗函數時域波形...................................................................26 圖3-3漢寧窗函數頻率響應.................................................................26 圖3-4快速傅立葉轉換...........................................................................27 圖3-5 譜峰值標記與記錄.....................................................................28 圖3-6 提出的諧波結構抽取流程圖.....................................................29 圖3-7統計資料抽取流程圖……...........................................................31 圖3-8(a) 音源1的累積分佈函數…………………………........……..32 圖3-8(b) 音源2的累積分佈函數……………………………………..32 圖3-9音源1和音源2的機率密度函數………………………………...33 IX 圖3-10(a)音源1的傅立葉反函數……………………...…………..…35 圖3-10(b)音源2的傅立葉反函數……………………...………….….35 圖3-11 重疊相加之摺積法…...………………………………………36 圖4-1(a) 時頻分析圖–短笛音源..........................................................39 圖4-1(b) 時頻分析圖–風琴音源..........................................................39 圖4-1(c) 時頻分析圖–雙簧管音源......................................................40 圖4-2短笛與風琴之混合時域訊號......................................................41 圖4-3時頻分析–短笛與風琴之混合訊號...........................................41 圖4-4實驗1 –短笛(左)分離前(右)分離後…...……............................42 圖4-5實驗1 –風琴(左)分離前(右)分離後..........................................42 圖4-6短笛與雙簧管之混合時域訊號…..............................................43 圖4-7時頻分析–短笛與雙簧管之混合訊號........................................43 圖4-8實驗2 –短笛(左)分離前(右)分離後..........................................44 圖4-9實驗2 –短笛(左)分離前(右)分離後..........................................44 圖4-10 實驗3 –短笛(左)分離前(右)分離後.......................................48 圖4-11 實驗3 –短笛(左)分離前(右)分離後.......................................49 圖4-12 實驗4 –短笛(左)分離前(右)分離後.......................................49 圖4-13 實驗4 –短笛(左)分離前(右)分離後.......................................50 X 附表索引表3-1 頻譜峰值陣列………………………………...…………...…….28 表 4-1客觀評分分析表–短笛vs. 風琴..................................................46 表 4-2客觀評分分析表–短笛vs. 雙簧管..............................................47

    [1] A. W.Bronkhorst, “The Cocktail Party Phenomenon: A Review on Speech Intelligibility in Multiple-Talker Conditions,” ActaAcustica united with Acustica, vol.86, pp. 117–128, 2000.
    [2] A. Hyvärinen, and E. Oja, “Independent component analysis: Algorithms and applications,” Neural Networks, vol. 13, pp. 411–430, 2000.
    [3] G. J. Brown, and M. P. Cooke, “Computational auditory scene analysis,” Comput. Speech Lang., vol. 8, pp. 297–336, 1994.
    [4] J. Eargle, Handbook of Recording Engineering 4th Edition, Kluwer Academic Publishers, 2003.
    [5] E. Zwicker, and H. Fastl, Psychoacoustics Facts and Models 2nd Updated Edition, Springer, 1999.
    [6] D. D. Lee, and H. S. Seung, “Learning the parts of objects by nonnegative matrix factorization,” Nature, vol. 401, pp. 788–791, 1999.
    [7] S. T. Roweis, “One microphone source separation,” in Proc. NIPS, pp. 15–19, 2001.
    [8] J. Hersheya, and M. Casey, “Audio-visual sound separation via hidden Markov models,” in Proc. NIPS, pp. 1173–1180, 2002.
    [9] E. Vincent, and M. D. Plumbley, “Single-channel mixture decomposition using Bayesian harmonic models,” in Proc. ICA, pp.722–730, 2006.
    [10] M. Bay, and J. W. Beauchamp, “Harmonic source separation using prestored spectra,” inProc. ICA, pp. 561–568, 2006.
    [11] Z. Duan, Y. Zhang, C. Zhang, and Z. Shi, “Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling,”IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 4, May 2008.
    [12] B.Raczynski, and B. Curtis, “Software data violate SPC's underlying assumptions,” IEEE Software, vol. 25, No. 3, pp. 49-51, June 2008.
    [13] C.Cattani, and J. Rushchitsky, “Wavelet and wave analysis as applied to materials with micro or nanostructure,” Series on Advances in Mathematics for Applied Sciences, vol. 47, Sep. 2007.
    [14] H. Viste, and G. Evangelista, “A method for separation of overlapping partials based on similarity of temporal envelopes in multichannel mixtures,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 3, pp. 1051–1061, May 2006.
    [15] Ganesh R. Naik, and Dinesh K Kumar, “An Overview of Independent Component Analysis and Its Applications,” Informatica 35, p.63–81, 2011.
    [16] S. Anbumalr, P. Rameshbabu, and R. Anandanatarajan, “Chromatograms separation using matrix decomposition” International Journal of Computer Applications (0975 – 8887), Volume 27– No.3, Aug. 2011.
    [17] P. Smaragdis, and Judith C. Brown, “Non-negative matrix factorization for polyphonic music transcription”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 19-22,Oct. 2003.
    [18] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”, Proceedings of the IEEE, pp. 257–286, February 1989.
    [19] J. Li, A. Najmi, and R. M. Gray, “Image classification by a two dimensional hidden Markov model”, IEEE Transactions on Signal Processing, vol.48, no.2, pp. 517-33, February 2000.
    [20] F. Scholkmann, J. Boss and M. Wolf, “An efficient algorithm for automatic peak detection in noisy periodic and quasi-periodic signals”, Algorithms, vol.5, pp. 588-603, 2012.
    [21] J. Barros, “On the use of the Hanning window for harmonic analysis in the standard framework”, IEEE Power & Energy Society, pp.538-539, Jan. 2006.
    [22] D. Shmilovitz, “On the definition of total harmonic distortion and its effect on measurement interpretation”, IEEE Transactions on power delivery, vol. 20, no. 1, January 2005.
    [23] M. Davy, S. Godsill, and J. Idier, “Bayesian analysis of polyphonic western tonal music,” J. Acoust. Soc. Amer., vol. 119, no. 4, pp. 2498–2517, 2006.
    [24] H. Thornburg, R. J. Leistikow, and J. Berger, “Melody extraction and musical onset detection via probabilistic models of framewise STFT peak data,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 4, pp. 1257–1272, May 2007.
    [25] Y. Zhang and C. Zhang, “Separation of music signals by harmonic structure modeling,” in Proc. NIPS, pp. 1617–1624, 2006.
    [26] M. K. I. Molla, and K. Hirose, “Single-mixture audio source separation by subspace decomposistion of Hilbert spectrum,” IEEE Trans. Audio, Speech, Lang., Process., vol. 15, no. 3, pp. 893–900, Mar. 2007.
    [27] V. Välimäki, J. Pakarinen, C. Erkut, and M. Karjalainen, “Discrete time modelling of musical instruments,” Rep. Progress in Phys., vol. 69, pp. 1–78, 2006.

    QR CODE
    :::