跳到主要內容

簡易檢索 / 詳目顯示

研究生: 連哲源
Zhe-Yuan Lian
論文名稱: 行動裝置上運用機器學習與語音分析於帕金森氏症診斷之可行性研究
Feasibility Study of Diagnosis of Parkinson's Diseases Based on Machine Learning and Voice Analysis on Mobile Devices
指導教授: 吳炤民
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 86
中文關鍵詞: 帕金森氏症機器學習行動裝置語音分析
外文關鍵詞: Parkinson's Disease, Machine Learning, Mobile Devices, Speech Analysis
相關次數: 點閱:10下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在近幾年的研究中,語音分析被認為可以客觀且有效的診斷帕金森氏症(Parkinson's disease, PD),然而語音分析工具大部分都須依靠特定儀器或電腦運作,這些設備不利於攜帶或移動,若採用行動裝置能有效的解決攜帶的問題,因此我們開發了一款語音分析的Android行動裝置軟體,並測試五種分類器,從中尋找合適的分類器對PD進行診斷。
    在實驗設計使用了74位帕金森患者的語音與50位健康者的語音,這些語音樣本為連續母音/a/,在實驗中測試了聲學參數對PD的相關性,包含了19個多面向音聲分析系統(Multidimensional Voice Program, MDVP)參數、歸一化噪音能量(Normalized Noise Energy, NNE)、平滑倒頻譜的峰值(Cepstral Peak Prominence Smoothed, CPPS)、長時間平均頻譜(Long-Term Average Spectrum, LTAS)、梅爾倒頻譜係數(Mel Frequency Cepstral Coefficients, MFCC)和可調Q因子小波轉換(Tunable Q-Factor Wavelet Transform, TQWT)。
    在過去使用TQWT診斷PD的研究中擁有432個參數,而當參數過於龐大時容易導致分類器過度擬合,因此須對TQWT進行降維,首先在實驗中我們測試Principal Component Analysis (PCA)、Linear Discriminant Analysis (LDA)和Hellinger Linear Discriminant Analysis (HLDA)對TQWT的降維能力,其中HLDA獲得最好效果且解決LDA無法調整參數的問題。
    在分類器中,選擇了最近鄰居法(K Nearest Neighbor, KNN)、多層感知器(Multi-Layer perceptron, MLP)、支持向量機(Support Vector Machine, SVM)、梯度提升決策樹(Gradient Boosting Decision Tree, GBDT)和多類海靈格線性判斷決策樹(Multi-class Hellinger Linear Discriminant decision tree, MHLDT)。
    共5組進行參數的比較,在實驗中將參數依照1)時域測量、2)噪音測量與3)MFCC分成3組,再加上4)全部的參數與5)海靈格距離(Hellinger distance, HD)挑選的10個參數,測試參數混和的效果。
    在結果中顯示噪音測量與MFCC的參數各自在不同的分類器中表現優於時域測量,與使用HD挑選的參數都為噪音測量與MFCC的結果一致,結合選中參數的特性與過去研究的結果發現測量聲帶受損導致的氣聲能有效的診斷PD。
    在分類器與參數的比較結果中,當使用SVM與HD所挑選的參數能獲得最高的準確度最高為97.5%,最終將選中的分類器與參數製作成Android 軟體,軟體中可以錄製語音並診斷PD。


    In recent years of research, voice analysis was believed to be objective and effective in the diagnosis of Parkinson's disease (PD), but most voice analysis tools today still need to work with specialized equipment or computers, which are not convenient for carrying or moving. Therefore, using of mobile devices could effectively solve the problem of carrying.
    In this study, we developed an Android app for mobile devices to perform voice analysis, and tested 5 distinct classifiers, from which to find a suitable classifier to diagnose PD.
    In experimental design we used voice samples of 74 PD patients and 50 healthy speakers, and these voice samples were sustained vowels /a/. In the experiment, we tested the correlation between PD and various voice parameters, including 19 Multidimensional Voice Program (MDVP) parameters, Normalized Noise Energy (NNE), Cepstral Peak Prominence Smoothed (CPPS), Long-Term Average Spectrum (LTAS), Mel Frequency Cepstral Coefficients (MFCC) and Tunable Q-Factor Wavelet Transform (TQWT).
    In the past studies, there are 432 parameters using TQWT to diagnose PD. If the number of parameters is high, it is easy to cause classifier overfitting, so TQWT has to be reduced in dimensionality. Two experiments were conducted in this study.
    In the first experiment, we tested the dimensionality reduction techniques based on the performance of Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and Hellinger Linear Discriminant Analysis (HLDA) on TQWT, where HLDA performed optimally and resolved the parameter adjust issue for LDA.
    The classifiers, K Nearest Neighbor (KNN), Multi-Layer perceptron (MLP), Support Vector Machine (SVM), Gradient Boosting Decision Tree (GBDT) and Multi-class Hellinger Linear Discriminant decision tree (MHLDT) were used to determine if the voice belonged to a PD patient.
    A total of 5 groups of parameters a were compared, the parameters were divided into three groups according to 1) time-domain measurement, 2) noise measurement, and 3) MFCC to test the performance of different characteristics. In addition, 4) all the parameters and 5) 10 parameters selected by Hellinger distance (HD) were also used to test the performance of parameter mixing.
    The results showed that the parameters of noise measurement and MFCC outperform those of time-domain measurement in different classifiers. The results are consistent with the parameters selected using HD for noise measurements and MFCC.
    Combining the characteristics of the selected parameters and the results of previous studies, it was found that measuring the breathy voice caused by the abnormal vocal cord can effectively diagnose PD.
    In the comparison of parameters and classifiers, the highest performance was observed using SVM and the 10 parameters selected by HD, and the accuracy was 97.5%.
    Finally, the selected classifier and parameters were implemented as an Android app, which could record voice and diagnose PD.

    摘要 I ABSTRACT III 目錄 V 圖目錄 VIII 表目錄 X 第一章:緒論 1 1.1研究動機 1 1.2文獻探討 4 1.3 研究目的 9 1.4論文架構 10 第二章: 語音參數 12 2.1 多面向音聲分析系統 (MDVP) 12 2.1.1基頻信息測量 12 2.1.2長期與短期頻率擾動測量 12 2.1.3長期與短期振幅擾動測量 14 2.1.4語音中斷測量 16 2.1.5次諧波測量 16 2.1.6聲音不規則性測量 16 2.1.7噪音測量 17 2.1.8震顫測量 18 2.2 歸一化噪音能量 (NNE) 19 2.3 平滑倒頻譜的峰值 (CPPS) 21 2.4 長時間平均頻譜-斜率 (LTAS) 22 2.5 梅爾倒頻譜係數 (MFCC) 23 2.6 可調Q因子小波轉換 (TQWT) 24 2.6.1可調Q因子小波轉換 (TQWT) 24 2.6.2 維度降低 27 2.6.2.1 線性判別分析(LDA) 27 2.6.2.2 主成分分析(PCA) 28 2.6.2.3 海靈格線性判別分析 (HLDA) 29 第三章: 機器學習 30 3.1 多層感知器 (MLP) 30 3.2 最近鄰居法(KNN) 31 3.3 支持向量機 (SVM) 32 3.4 梯度提升決策樹(GBDT) 34 3.5 多類海靈格線性判斷決策樹(MHLDT) 35 第四章: 實驗方法 37 4.1實驗中應用的資料庫 37 4.2實驗中參數的分組 38 4.3實驗介紹 39 4.3.1實驗一:降維測試 39 4.3.2實驗二:參數與分類器比較 39 4.4評分方式 40 第五章: 結果與討論 43 5.1實驗結果 43 5.1.1實驗一 43 5.1.2實驗二 48 5-2行動裝置軟體介紹 56 5-3討論 59 第六章: 結論與未來展望 64 6.1 結論 64 6.2 未來展望 66 參考文獻 67

    Anowar, F., Sadaoui, S., & Selim, B. (2021) “Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE)” Computer Science Review, vol. 40
    Balakrishnama, S., & Ganapathiraju, A. (1998) “Linear discriminant analysis-a brief tutorial.” Institute for Signal and information Processing, vol. 18, pp. 1-8
    Balestrino, R. & Schapira, A. (2020). “Parkinson disease.” European Journal of Neurology, vol. 27, no. 1, pp. 27-42.
    Belalcazar-Bolanos, E. A., Orozco-Arroyave, J. R., Arias-Londono, J. D., Vargas-Bonilla, J. F., & Nöth, E. (2013) “Automatic detection of Parkinson's disease using noise measures of speech.” Symposium of Signals, Images and Artificial Vision-2013: STSIVA-2013, pp. 1-5
    Berus, L., Klancnik, S., Brezocnik, M., & Ficko, M. (2018) “Classifying Parkinson's Disease Based on Acoustic Measures Using Artificial Neural Networks.” Sensors, vol. 19, no. 1, pp. 1-16
    Bourouhou, A., Jilbab, A., Nacir, C., & Hammouch, A. (2016) “Comparison of classification methods to detect the Parkinson disease.” 2016 International Conference on Electrical and Information Technologies (ICEIT), pp. 421-424
    Brückl, M., Ghio, A., & Viallet, F. (2018) “Measurement of Tremor in the Voices of Speakers with Parkinson’s Disease.” Procedia Computer Science, vol. 128, pp. 47-54
    Cañete-Sifuentes, L., Monroy, R., Medina-Pérez, M. A., Loyola-González, O., & Voronisky, F. V. (2019) “Classification Based on Multivariate Contrast Patterns.” IEEE Access, vol. 7, pp. 55744-55762
    Canter, GJ. (1965) “Speech characteristics of patients with Parkinson’s disease. 3.Articulation, diadochokinesis, and over-all speech adequacy.” The Journal of speech and hearing disorders, vol. 30, no. 3, pp. 217-324
    Chicco, D., & Jurman, G. (2020) “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.” BMC Genomics, vol.21, no. 1, pp. 1 -13
    Ghio, A., Robert, D., Grigoli, C., Mas, M., Delooze, C., Mercier, C., & Viallet, F. (2014) “F0 characteristics in Parkinsonian speech: Contrast between the effect of hypodopaminergy due to Parkinson's disease and that of the therapeutic delivery of L-Dopa.” Revue de laryngologie-otologie-rhinologie, vol. 135, no. 2, pp. 63-70
    Gillivan-Murphy, P., Miller, N., & Carding, P. (2019) “Voice Tremor in Parkinson's Disease: An Acoustic Study.” Journal of voice, vol. 33, no. 4, pp. 526-535
    Godino-Llorente, J. I., Gomez-Vilda, P., & Blanco-Velasco, M. (2006) “Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters.” IEEE Transactions on Biomedical Engineering, vol. 53, no. 10, pp. 1943-1953
    Godino-Llorente, J. I., Osma-Ruiz, V., Sáenz-Lechón, N., Gómez-Vilda, P., Blanco-Velasco, M., & Cruz-Roldán, F. (2010) “The effectiveness of the glottal to noise excitation ratio for the screening of voice disorders.” Journal of Voice, vol. 24, no. 1, pp. 47-56
    Gunduz, H. (2019) “Deep Learning-Based Parkinson’s Disease Classification Using Vocal Feature Sets.” IEEE Access, vol. 7, pp. 115540-115551
    Ho, A. K., Iansek, R., Marigliani, C., Bradshaw, J. L., & Gates, S. (1998) “Speech impairment in a large sample of patients with Parkinson's disease.” Behavioural neurology. vol. 11, no. 3, pp. 131-137
    Hillenbrand, J., & Houde, R. A. (1996) “Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech.” J Speech Hear Res, vol. 39, no. 2, pp. 311-321
    Kasuya, H., Ogawa, S., Mashima, K., & Ebihara, S. (1986) “Normalized noise energy as an acoustic measure to evaluate pathologic voice.” Journal of the Acoustical Society of America, vol. 80, no. 5, pp.1329-1334
    Lahmiri, S., Dawson, D. A., & Shmuel, A. (2018) “Performance of machine learning methods in diagnosing Parkinson's disease based on dysphonia measures.” Biomedical engineering letters, vol. 8, no. 1, pp. 29-39
    Liu, W. M., Wu, R. M., Lin, J. W., Liu, Y. C., Chang, C. H., & Lin, C. H. (2016) “Time trends in the prevalence and incidence of Parkinson's disease in Taiwan: A nationwide, population-based study.” Journal of the Formosan Medical Association, vol. 115, no. 7, pp. 531-538
    Ma, A., Lau, K. K., & Thyagarajan, D. (2021) “Radiological correlates of vocal fold bowing as markers of Parkinson’s disease progression: A cross-sectional study utilizing dynamic laryngeal CT.” PloS one, vol. 16, no. 10, e0258786
    Manfredi, C., Pierazzi, L., & Bruscaglioni, P. (2000) “A Measure of Voice Hoarseness in Time and Frequency Domain.” IFAC Proceedings Volumes, vol. 33, no. 3, pp. 41-46.
    Mathew, M. M., & Bhat, J. S. (2009) “Soft phonation index—a sensitive parameter?” Indian Journal of Otolaryngology and Head & Neck Surgery, vol.61, no. 2, pp. 127-130
    Marras, C., Beck, J. C., Bower, J. H., Roberts, E., Ritz, B., Ross, G. W., ... & Tanner, C. M. (2018) “Prevalence of Parkinson’s disease across North America.” NPJ Parkinson's Disease, vol. 4, no. 1, pp. 1-7
    Meyer-Baese, A., & Schmid, V. J. (2014) “Chapter 7 - Foundations of Neural Networks.” Pattern Recognition and Signal Analysis in Medical Imaging, pp. 197-243
    Midi, I., Dogan, M., Koseoglu, M., Can, G. Ü. N. A. Y., Sehitoglu, M. A., & Gunal, D. I. (2008) “Voice abnormalities and their relation with motor dysfunction in Parkinson’s disease.” Acta Neurologica Scandinavica, vol. 117, no. 1, pp. 26-34
    Murman, D. L. (2012) “Early treatment of Parkinson's disease: opportunities for managed care.” The American journal of managed care, vol.18, no. 7, pp. 183-188.
    Noble, W. (2006) “What is a support vector machine?” Nature biotechnology, vol. 24, no. 12, pp. 1565–1567
    Chén, O. Y., Lipsmeier, F., Phan, H., Prince, J., Taylor, K. I., Gossens, C., & De Vos, M. (2020). “Building a machine-learning framework to remotely assess Parkinson's disease using smartphones.” IEEE Transactions on Biomedical Engineering, vol. 67, no. 12, pp. 3491-3500
    Peter Kitzing (1986) “LTAS criteria pertinent to the measurement of voice quality.” Journal of Phonetics, vol.14, no.3-4, pp. 477-482
    Perju-Dumbrava, L., Lau, K., Phyland, D., Papanikolaou, V., Finlay, P., Beare, R., & Thyagarajan, D. (2017) “Arytenoid cartilage movements are hypokinetic in Parkinson’s disease: A quantitative dynamic computerised tomographic study.” PloS one, vol. 12, no. 11, e0186611
    Pramono, R. X. A., Imtiaz, S. A., & Rodriguez-Villegas, E. (2019) “Evaluation of features for classification of wheezes and normal respiratory sounds.” PloS one, vol.14, no.3, e0213659
    Rizzo, G., Copetti, M., Arcuti, S., Martino, D., Fontana, A., & Logroscino, G (2016) “Accuracy of clinical diagnosis of Parkinson disease: A systematic review and meta-analysis.” Neurology, vol. 86, no. 6, pp. 566-576
    Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986) “Learning representations by back-propagating errors.” Nature, vol. 323, pp. 533–536
    Rusz, J., Hlavnička, J., Tykalova, T., Novotný, M., Dušek, P., Šonka, K., & Růžička, E. (2018). “Smartphone allows capture of speech abnormalities associated with high risk of developing Parkinson’s disease.” IEEE transactions on neural systems and rehabilitation engineering, vol.26, no.8, pp.1495-1507
    Rusz, J., Cmejla, R., Ruzickova, H., & Ruzicka, E. (2011) “Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson's disease.” The journal of the Acoustical Society of America, vol. 129, no. 1, pp. 350-367
    Saenz-Lechon, N., Fraile, R., Godino-Llorente, J. I., Fernández-Baíllo, R., Osma-Ruiz, V., Gutiérrez-Arriola, J. M., & Arias-Londoño, J. D. (2011) “Towards objective evaluation of perceived roughness and breathiness: an approach based on mel-frequency cepstral analysis.” Logopedics phoniatrics vocology, vol. 36, no. 2, pp. 52-59
    Sakar, C. O., Serbes, G., Gunduz, A., Tunc, H. C., Nizam, H., Sakar, B. E., ... & Apaydin, H. (2019) “A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform.” Applied Soft Computing, vol. 74, pp. 255-263
    Selesnick, I. W. (2011) “Wavelet Transform with Tunable Q-Factor.” IEEE Transactions on Signal Processing, vol. 59, no. 8, pp. 3560-3575
    Sharma A. & Giri R. N. (2014) “Automatic recognition of Parkinson’s Disease via artificial neural network and support vector machine.” Journal of Innovative Technology and Exploring Engineering, vol. 4, no. 3, pp. 2278-3075
    Shoji, K., Regenbogen, E., Yu, J. D., & Blaugrund, S. M. (1992) “High‐frequency power ratio of breathy voice.” The Laryngoscope, vol. 102, no. 3, pp.267-271.
    Smith, L. K., & Goberman, A. M. (2014) “Long-time average spectrum in individuals with parkinson disease.” NeuroRehabilitation, vol. 35, no. 1, pp. 77-88.
    Swain, P. H., & Hauska, H. (1977) “The decision tree classifier: Design and potential” IEEE Transactions on Geoscience Electronics, vol.15, no. 3, pp.142-147
    Šimek, M., & Rusz, J. (2021) “Validation of cepstral peak prominence in assessing early voice changes of Parkinson's disease: Effect of speaking task and ambient noise.” The Journal of the Acoustical Society of America, vol.150, no. 6, pp.4522-4533.
    Teixeira, J. P., Oliveira, C., & Lopes, C. (2013) “Vocal Acoustic Analysis – Jitter, Shimmer and HNR Parameters.” Procedia Technology, vol. 9, pp. 1112-1122
    Woldert-Jokisz B. (2007). “Saarbruecken Voice Database”
    Yumoto, E., Gould, W. J., & Baer, T. (1982) “Harmonics‐to‐noise ratio as an index of the degree of hoarseness.” The journal of the Acoustical Society of America, vol. 71, no. 6, pp.1544-1550
    Yu, S., Li, X., Zhang, X., & Wang, H. (2019) “The OCS-SVM: An Objective-Cost-Sensitive SVM with Sample-Based Misclassification Cost Invariance.” IEEE Access, vol. 7, pp. 118931-118942
    Zhang, Z. (2016) “Introduction to machine learning: k-nearest neighbors.” Annals of translational medicine, vol. 4, no. 11
    Zhang, Y. N. (2017) “Can a Smartphone Diagnose Parkinson Disease? A Deep Neural Network Method and Telediagnosis System Implementation” Parkinson’s Disease
    賴靖如 (2017) “以聲音特徵為基礎帕金森氏症診斷” 國立中興大學資訊管理學系所碩士論文。
    Danisa (2020) “基於機器學習分析帕金森氏症患者之語音” 國立中央大學電機工程學系碩士論文。
    謝承恩 (2022) “巴金森病友年增2千人 左旋多巴藥物使用量最多” 聯合報
    Statista:
    https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/ 2022/02/07

    QR CODE
    :::