跳到主要內容

簡易檢索 / 詳目顯示

研究生: 劉佩昀
Pei-yun Liu
論文名稱: 一個加速時頻域遮罩之盲訊號分離演算法
Blind Source Separation Using a Fast Time Frequency Mask Technique
指導教授: 蔡宗漢
Tsung-han Tsai
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 66
中文關鍵詞: 盲訊號分離摺積性混合訊號時頻域遮罩
外文關鍵詞: blind separation, convolutive mixture, time frequency mask
相關次數: 點閱:9下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 盲訊號分離主要處理雞尾酒會問題,他的概念是在一個派對中,一些人同時說話,即使身旁有很多干擾,我們也可以很容易去聽某個人的談話內容,這是因為人的大腦可以自然的去分離訊號,但這個過程對於數位電路來說卻很複雜。盲訊號分離的目的為,在一個房間用多個麥克風放不同位置同時錄音,並利用這個訊號,解析原始的聲音源。
    應用層面廣泛,包含行動通訊、多使用者通訊系統、吵雜環境下增強語音訊號。
    盲訊號分離是以摺積性混合訊號為假設基礎,去做訊號重建之技術。混合訊號會經過短時傅立葉轉換,轉換到頻域,因為訊號源有稀疏性特性,我們可以根據空間特徵,來聚集這些特徵時頻點。特徵擷取最重要的概念就是,兩個不移動的聲源會各自產生聲波傳遞到雙麥克風,因為麥克風相對於聲源有遠近的差異,所以聲波到達麥克風有先後順序。一般來說,可以用各個聲源到兩個麥克風的相位差和強度比作為空間特徵。空間特徵是以複數型態表示,散佈於複數平面上。之後,利用k-meams 演算法,將特徵點分成N類,每一類就代表一個聲源。接下來,使用二元時頻遮罩技術將分類好的時頻點標記出來,也就是說,如果此時頻點屬於目標語音則則為1,若非則為0。最後將完整的遮罩與混合訊號做點乘,即可以得到分離的訊號。最後,將結果利用反短時傅立葉轉換,回到時域。
    為了解決旋積盲訊號源分離問題,本論文提出了一個加速時頻域遮罩之盲訊號分離演算法。首先我們先定義兩個特徵參數包括了訊號的強度比以及相位差,然後利用降低資料變異數方式,讓兩個特徵的變異數相似,好處是利於K-means的收斂,再用K-Means演算法對每個頻帶的資料群聚。最後。根據群聚的特徵點,將時頻遮罩結果計算出來。
    在真實環境中,我們可以依據麥克風的收錄的聲音,直接分離訊號,再以SDR(Signal to distortion ratio ) 、SIR(Signal to interference ratio) 評估訊號品質。此方法讓聚類速度加快,不僅不會降低訊號品質,演算法簡易。


    The goal of BSS is solving cocktail party problem. Imagine a room with a number of persons and microphones for recording. When people are speaking at the same time, each microphone registers a different mixture of individual speaker's audio signals. And the task of BSS is to untangle these mixtures into their sources. There are various applications including
    mobile telephony, multiuser communication systems, voice reinforce in noisy environment.
    The mixtures recorded by microphones will be transformed to frequency domain with STFT (Short-Time Fourier Transform). Owing to the characteristics, sparseness and the disjointness ,of the source signal, we can obtain those features from the mixtures during feature extraction
    step. The features are represented as complex number. Afterwards, by utilizing K-meansalgorithm, we divide those features into N group, where N is the number of sources. Prior to transform the separated signal back to time domain, we adopt mask design to label the target signal, for example, if the target signal is a speech signal, we will label it one, otherwise zero.
    To solve the convolutive blind source separation (BSS) problem, this thesis presents a new method which utilizing a fast time frequency mask technique. We first define two features, which are normalized level-ratio and phase-difference. Next, we reduce the variance of feature in order to obtain lower iterations of K-Means clustering. Afterwards, with K-means algorithm, we cluster the features by assigning them to the nearest group. In the end, according to the clustered features, a time frequency mask is generated. The method is not only easy, but also faster without reducing the quality of the target signal. In real environment,we also evaluate the separated signal in terms of SDR (signal to distortion ratio) and SIR (signal to interference ratio).

    摘要 ................................................................................................................................................................ VI ABSTRACT ...................................................................................................................................................... VII CHAPTER 1 .............................................................................................................................. 1 INTRODUCTION .................................................................................................................. 1 1.1 MOTIVATION .................................................................................................................... 2 1.2 THESIS ORGANIZATION .................................................................................................... 6 CHAPTER 2 .............................................................................................................................. 7 BACKGROUND ..................................................................................................................... 7 2.1 BSS BASED ON INDEPENDENT COMPONENT ANALYSIS .................................................... 8 2.2 BSS BASED ON SPARSENESS ........................................................................................... 10 CHAPTER 3 ............................................................................................................................ 15 ALGORITHM OVERVIEW ................................................................................................. 15 3.1 BINARY MASK BASED APPROACH .................................................................................. 16 3.2 NON-BINARY MASK BASED APPROACH ......................................................................... 21 3.3 FREQUENCY-DOMAIN INDEPENDENT COMPONENT ANALHYSIS .................................. 24 CHAPTER 4 ............................................................................................................................ 32 PROPSED ALGORITHM ................................................................................................... 32 4.1 BINARY MASK BASED APPROACH WITH REDUCTION OF DOA VARIANCE ....................... 33 4.2 AN EXAMPLE OF EXPERIMENT RESULT ............................................................................ 35 CHAPTER 5 ............................................................................................................................ 37 EXPERIMENT RESULTS .................................................................................................. 37 5.1PERFORMANCE EVALUATION ......................................................................................... 38 5.2EXPERIMENTAL SETTING ................................................................................................ 38 5.3 EXPERIMENTAL RESULTS OF FREQUENCY DOMAIN ICA .................................................. 39 5.4 EXPERIMENTAL RESULTS OF BINARY/NON-BINARY MASK APPROACH .......................... 44 CHAPTER 6 ............................................................................................................................ 50 CONCLUSION ..................................................................................................................... 51 Reference.................................................................................................................................. 52

    [1] O. M. Mitchell; C. A. Ross; G. H. Yates. “Signal processing for a cocktail party effect,”
    Journal of the Acoustic Society of America, 1971.
    [2] M. S. Pedersen; D. Wang; J. Larsen; U. Kjems. “Separating Underdetermined
    Convolutive Speech Mixtures,” Independent Component Analysis and Blind Signal
    Separation Lecture Notes in Computer Science, LNCS 3889, pp. 674–681, 2006.
    [3] Mitianoudis, N. ; Davies, M.E .Audio source separation of convolutive mixtures.
    Speech and Audio Processing, IEEE Transactions on. Sept. 2003
    [4] L. Parra; C. Spence. “Convolutive blind separation of non-stationary sources .” Speech
    and Audio Processing, IEEE Transactions on, vol.8 ,May 2000, pp.320 - 327
    [5] A. Mansour; N. Benchekroun; C. Gervaise. “Blind Separation of Underwater Acoustic
    Signals,” Independent Component Analysis and Blind Signal Separation Lecture Notes
    in Computer Science, vol.3889, pp. 181-188,2006
    [6] Z. Koldovský ˇ; P. Tichavský, “Time-Domain Blind Separation of Audio Sources on
    the Basis of a Complete ICA Decomposition of an Observation Space,” Audio, Speech,
    and Language Processing, IEEE Transactions on ,vol.19 , ppF.406-416, Feb. 2011.
    [7] H. Saruwatari; K. Sawai; T. Nishikawa; A. Lee; K. Shikano; A. Kaminuma; M. Sakata;
    D. Saitoh,“Speech Enhancement Based on Blind Source Separation in Car
    Environments,” Data Engineering Workshops 21st International Conference on, April
    2005
    [8] S. Araki; R. Mukai; S. Makino; T. Nishikawa; H. Saruwatari, “The Fundamental
    Limitation of Frequency Domain Blind Source Separation for Convolutive Mixtures of
    Speech,” Speech and Audio Processing, IEEE Transactions on ,vol.11,pp.109-116, Mar
    2003
    [9] S. Cruces-Alvarez; A. Cichocki; L. Castedo-Ribas, “An iterative inversion approach to
    blind source separation”,Neural Networks, IEEE Transactions on,vol.11 ,pp.1423-1437,
    Nov 2000.
    [10] K. I. Diamantaras; Th. Papadimitriou,“ MIMO blind deconvoluition using
    subspace-based filter deflation,” Acoustics, Speech, and Signal Processing, IEEE
    International Conference on, vol.4, pp.433 - 436 , May 2004.
    [11] D. Nuzillard; A. Bijaoui, “Blind source separation and analysis of multispectral
    astronomical images,” Astronomy and Astrophysics Supplement, vol.147, pp.129-138,
    Nov.2000.
    [12] Jo¨rn Anemu¨ller; Terrence J. Sejnowski; Scott Makeiga.“Complex independent
    - 53 -
    component analysis of frequency-domain electroencephalographic data”, Neural
    Networks, pp. 1311–1323, Aug.2003
    [13] M. Dyrholm; S. Makeig; Lars Kai Hansen , “Model structure selection in convolutive
    mixtures, Independent Component Analysis and Blind Signal Separation Lecture Notes
    in Computer Science, vol. 3889, pp.74-81
    , 2006.
    [14] Carlos Vayá; José Joaquín Rieta; César Sánchez; David Moratal. “Performance study
    of convolutive BSS algorithms applied to the electrocardiogram of atrial fibrillation”,
    Independent Component Analysis and Blind Signal Separation Lecture Notes in
    Computer Science ,vol. 3889, pp 495-502,2006.
    [15] Lars Kai Hansen. “ICA of fMRI based on a convolutive mixture model”, Ninth Annual
    Meeting of the Organization for Human Brain Mapping (HBM 2003), June.2003
    [16] Araki, S., Sawada, H., Mukai, R. and Makino, S., Normalized observation vector
    clustering approach for sparse source separation. In: Proceedings of the EUSIPCO
    2006.
    [17] A. Hyvärinen ; E. Oja,“Independent Component Analysis:Algorithms and
    Applications,” Neural Networks, pp.411-430, 2000
    [18] Yilmaz, O.; Rickard Scott. “Blind separation of speech mixtures via time–frequency
    masking,” Signal Processing, IEEE Transactions on,vol.52, July 2004
    [19] Amari, S.; Douglas, S.C. ; Cichocki, A. ; Yang, H.H. “Multichannel blind
    deconvolution and equalization using the natural gradient”, Signal Processing
    Advances in Wireless Communications, First IEEE Signal Processing Workshop on,
    pp.101–104, April.1997,
    [20] M. Kawamotoa; Ki. Matsuokab; N. Ohnishia, “A method of blind separation for
    convolved non stationary signals”, Neurocomputing,vol.22, pp. 157–171, Nov.1998.
    [21] P. Smaragdis,“blind separation of convolved mixture in the frequency domain,”
    Neurocomputing, Vol. 22, No. 1-3. (20 November 1998), pp. 21-34
    [22] A. Hyvärinen,“Fast and Robust Fixed-Point Algorithms for Independent Component
    Analysis”, IEEE Trans. on Neural Networks, pp.626-634, 1999.
    [23] E. Bingham; A. Hyvärinen,“a fast fixed point algorithm for independent analysis of
    complex valued signals”,International Journal of Neural Systems,vol. 10, No. 1,Feb.
    2000.
    [24] H. Sawada; R. Mukai ;Se´bastien de la Kethulle de Ryhove; S. Araki; S.
    Makino,“spectral smoothing for frequency domain blind source separation,”
    International workshop on acoustic echo and noise control(IWAENC ),Sep.2003.
    [25] Robledo-Arnuncio; E. ; Biing-Hwang Juang, “Issues in frequency domain blind
    - 54 -
    source separation - a critical revisit”, Acoustics, Speech, and Signal Processing, IEEE
    International Conference on (ICASSP),vol.5,Mar.2005
    [26] T. Nishikawa; H. Saruwatari; and K. Shikano,“Blind source separation of acoustic
    signals based on multistage ICA combining frequency-domain ICA and time-domain
    ICA,” IEICE Trans. Fundamentals,vol. E86-A, no. 4, pp. 846–858, Sep 2003.
    [27] P. Bofill; M. Zibulevsky, Blind separation of more sources than mixtures using sparsity
    of their short-time Fourier transform, in: Proceedings of the ICA2000, 2000, pp.
    87–92.
    [28] Jourjine, A. ; Rickard, Scott ; Yilmaz, O.” “Blind separation of disjoint orthogonal
    signals: demixing N sources from 2 mixtures”,Acoustics, Speech, and Signal Processing
    (ICASSP), 2000 IEEE International Conference on, vol. 5, 2000.
    [29] S. Winter; W. Kellermann; H. Sawada; S. Makino1,“MAP based underdetermined
    blind source separation of convolutive mixtures by hierarchical clustering and l1-norm
    minimization”, EURASIP Journal on Advances in Signal Processing, 2007.
    [30] M. Aoki; M.Okamoto; S. Aoki; H. Matsui; T. Sakurai; Y. Kaneda, “Sound source
    segregation based on estimating incident angle of each frequency component of input
    signals acquired by multiple microphones,” Acoustical Science and Technology,
    pp.149-157,Jan. 2001.
    [31] S. Rickard , R. Balan , J. Rosca,“Real-time time–frequency based blind source
    separation,” in Proc. of International Conference on Independent Component Analysis
    and Signal Separation ,2001
    [32] S. Winter , H. Sawada , S.Araki , S. Makino,“Overcomplete BSS for convolutive
    mixtures based on hierarchical clustering”, Independent Component Analysis and
    Blind Signal Separation Lecture Notes in Computer Science, vol.3195, pp.
    652-660,2004.
    [33] Guy J. Brown; D. Wang, “Separation of Speech by Computational Auditory Scene
    Analysis”, reprinted from Speech Enhancement, pp. 371–402, 2005.
    [34] Yilmaz, O. ;Rickard, Scott, “Blind Separation of Speech Mixtures via Time-Frequency
    Masking”, Signal Processing, IEEE Transactions on ,vol.52, July. 2004.
    [35] S. Arakia; H. Sawadaa; R. Mukaia; S. Makinoa ,“Underdetermined blind sparse source
    separation for arbitrarily arranged multiple sensors”, Signal Processing , pp.1833-1847,
    Aug.2007.
    [36] S. Araki; H. Sawada; R. Mukai; S. Makino, “Blind sparse source separation with
    spatially smoothed time-frequency masking”, IWAENC, Sep. 2006.
    [37] Muhammad Z. Ikram; Dennis R. Morgan, “Permutation inconsistency in blind speech
    separation”, Speech and Audio Processing, IEEE Transactions on ,vol.13, Jan. 2005
    - 55 -
    [38] H. Sawada, R. Mukai, S. Araki ,S.Makino, “A Robust and Precise Method for Solving
    the Permutation Problem of Frequency-Domain Blind Source Separation”, Speech and
    Audio Processing, IEEE Transactions on ,vol. 12, No. 5, Sep. 2004.
    [39] Cardoso, J.F.;

    QR CODE
    :::