跳到主要內容

簡易檢索 / 詳目顯示

研究生: 黃紹航
Shao-Hang Huang
論文名稱: 情感分析方法於COVID-19疫情預測之適用性評估
指導教授: 蘇坤良
口試委員:
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理學系
Department of Information Management
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 71
中文關鍵詞: 情感分析詞嵌入疫情預測機器學習深度學習
相關次數: 點閱:10下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • COVID-19仍持續威脅著世界各國的公共衛生,而有效地預測COVID-19確診以及死亡人數上升或下降的趨勢,將有助於研究人員和政策制定者通過將 COVID-19 推向正確的方向來降低死亡率和確診率,目前對於COVID-19疫預測皆為使用結構化資料進行預測,並沒有學者使用非結構化的資料進行預測。然而,在非結構化預測上,因為社群媒體的蓬勃發展,透過社群媒體的文本進行預測,在各領域上皆有許多學者以此進行實驗,因此本研究想要透過社群媒體上有關於COVID-19的文本進行疫情趨勢預測。
    本研究主要是利用不同的情感分析方法,將社群媒體的文本產生每日情感分數,再結合結構化資料進行疫情預測,預測目標為確診人數變化以及死亡人數變化。本研究選用不同的情感分析方法(辭典法、情感分析套件、靜動態詞嵌入方法),並使用三種不同的分類器,SVM、LSTM、Bi-GRU進行分類,去分析最為有效預測疫情趨勢的組合,最終,本實驗發現以動態詞嵌入方法RoBERTa搭配Bi-GRU有最佳疫情趨勢預測,在預測確診人數,其評估指標Precision最高達75.89%。


    COVID-19 is continuing to threaten the public hygiene of countries around the world. An
    efficiently way to predict the trend of COVID-19 epidemic will help researchers and policy maker make the right decision to reduce the mortality rate and confrimed case rate.At present,All research on COVID-19 epidemic prediction is based on technical data, However, With the development of social media,Using social media texts to predict is common in various fields.Therfore, This research is mainly discussed about using different sentiment analysis methods to generate daily sentiment scores from social media texts,and combine technical data
    for epidemic prediction.
    This research selects different sentiment analysis methods(dictionary method, API, and dynamic word embedding sentiment analysis method),and uses three different classifiers ,SVM、LSTM、Bi-GRU for epidemic prediction. At the end of the research, we found that the dynamic word embedding sentiment analysis method RoBERTa with the epidemic prediction classifier Bi-GRU can predict the trend of COVID-19 epidemic with best combination. In predicting the
    number of confirmed cases, evaluation indicator precision is rasie to 75.89%.

    摘要 i Abstract ii 誌謝 iii 目錄 iv 圖目錄 vi 表目錄 viii 一、緒論 1 1-1 研究背景 1 1-2 研究動機 2 1-3 研究目的 3 二、文獻探討 5 2-1 COVID-19情感分析任務在社群媒體上的研究 5 2-2 探討不同種類詞嵌入的方法 6 2-2-1 Word2Vec 9 2-2-2 Glove 9 2-2-3 BERT 10 2-2-4 GRUBERT 12 2-2-5 RoBERTa 12 2-3 預測COVID-19確診人數以及死亡人數的分類器模型 13 2-3-1 SVM 14 2-3-2 RNN 14 2-3-3 LSTM 15 2-3-4 Bi-GRU 16 三、研究方法 17 3-1 資料蒐集 18 3-2 資料前處理 18 3-2-1 非結構化資料前處理 18 3-2-2 結構化資料前處理 19 3-3 詞嵌入方法在情感分析任務上的預測效能 20 3-4 標註公式以及發酵日期 21 3-5 評估指標 22 3-6 探討不同情感分析方法以及不同分類器對於疫情預測之適用性 23 3-6-1 Day Forward-Chaining 24 3-6-2 辭典法 25 3-6-3 情感分析套件Vader 26 3-6-4 詞嵌入方法 26 3-6-5 分類器 26 3-7 探討文本與結構化資料合併後在疫情趨勢預測的效用 26 四、實驗結果與分析 27 4-1 探討詞嵌入方法在Sentiment140資料集下的效能 27 4-2 探討不同情感分析方法以及不同分類器對於疫情預測之影響 29 4-2-1 探討最佳人數變化倍率以及發酵日 29 4-2-2 不同分類器下比較不同情感分析方法對於疫情預測之影響 35 4-2-2 小結 40 4-3 探討不同國家資料對於疫情預測之影響 41 4-4 探討不同關鍵字的資料對於疫情預測之影響 42 4-4-1 探討所下關鍵字不同蒐集的資料集對於準確率之影響 43 4-4-2 探討不同資料集合併後對於疫情趨勢的準確率之影響 45 4-5 探討兩種資料型態合併後對於疫情預測之影響 47 五、結論 49 5-1 結論與貢獻 49 5-2 研究限制 51 5-3 未來研究與建議 51 參考文獻 53

    [1] Bandyopadhyay, Samir Kumar, and Shawni Dutta, “Machine learning approach for confirmation of covid-19 cases: Positive, negative, death and release,” MedRxiv, vol. 1, pp. 1-10, 2020.
    [2] Statista, Social network penetration worldwide from 2017 to 2025,https://www.statista.com/statistics/260811/social-network-penetration-worldwide/.
    [3] Dhaoui, Chedia, Cynthia M. Webster, and Lay Peng Tan. "Social media sentiment analysis: lexicon versus machine learning."Journal of Consumer Marketing, vol.34, pp.1-9, 2017.
    [4] Subhasis Sanyal, Mohit Kumar Barai. "Comparative Study on Lexicon-based sentiment analysers over Negative sentiment. "International Journal of Electrical, Electronics and Computers, vol.6, pp.1-13, 2021
    [5] Yadav, Ashima, and Dinesh Kumar Vishwakarma. "Sentiment analysis using deep learning architectures: a review." Artificial Intelligence Review, vol.53, pp.4335-4385, 2020.
    [6] Zhao, Wei, et al. "Weakly-supervised deep embedding for product review sentiment analysis." IEEE Transactions on Knowledge and Data Engineering, vol.30, pp.185-197, 2017.
    [7] Dubey, Akash Dutt. "Twitter Sentiment Analysis during COVID-19 Outbreak, "SSRN, pp.1-9, 2020, http://dx.doi.org/10.2139/ssrn.3572023.
    [8] Yin, Hui, Shuiqiao Yang, and Jianxin Li. "Detecting topic and sentiment dynamics due to COVID-19 pandemic using social media. "International Conference on Advanced Data Mining and Applications, vol.12447, pp.610-623, 2020.
    [9] Fernandez, Gabriela, et al. "Sentiment analysis of social media response and spatial distribution patterns on the COVID-19 outbreak: The case study of Italy. " Empowering Human Dynamics Research with Social Media and Geospatial Data Analytics, vol.1, pp.167-184, 2021.
    [10] Elbagir, Shihab, and Jing Yang. "Twitter sentiment analysis using natural language toolkit and VADER sentiment." Proceedings of the International MultiConference of Engineers and Computer Scientists, vol.122, pp.1-5, 2019
    [11] Marcec, Robert, and Robert Likic. "Using Twitter for sentiment analysis towards AstraZeneca/Oxford, Pfizer/BioNTech and Moderna COVID-19 vaccines." Postgraduate Medical Journal, pp.1-7, 2021.
    [12] Naseem, Usman, et al. "Covidsenti: A large-scale benchmark Twitter data set for COVID-19 sentiment analysis." IEEE Transactions on Computational Social Systems, vol.8, pp.1003-1015, 2021.
    [13] Yuxuan Wang et al. "From static to dynamic word representations: a survey. "International Journal of Machine Learning and Cybernetics, vol.11, pp.1611-1630, 2020.
    [14] Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv, pp.1-12, 2013.
    [15] Jeffrey Pennington, Richard Socher, Christopher Manning” GloVe: Global Vectors for Word Representation” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), vol1, pp 1532–1543,2014.
    [16] Jacob Devlin et al. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, pp. 4171–4186, 2019.
    [17] Horne, Leo, et al. "GRUBERT: A GRU-Based Method to Fuse BERT Hidden Layers for Twitter Sentiment Analysis." Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop, vol.1, pp.130-138, 2020.
    [18] Ballı,Serkan "Data analysis of Covid-19 pandemic and short-term cumulative case forecasting using machine learning time series methods. "Chaos, Solitons & Fractals, vol.142, pp.1-11, 2021.
    [19] Ayoobi, Nooshin, et al. "Time Series Forecasting of New Cases and New Deaths Rate for COVID-19 using Deep Learning Methods. " Results in Physics, vol.27, pp.1-26, 2021.
    [20] Fierro, Constanza, Jorge Pérez, and Javier Mora. "Predicting unplanned readmissions with highly unstructured data." arXiv preprint, pp.1-7, 2020.
    [21] Kilimci, Zeynep Hilal, and Ramazan Duvar. "An Efficient Word Embedding and Deep Learning Based Model to Forecast the Direction of Stock Exchange Market Using Twitter and Financial News Sites: A Case of Istanbul Stock Exchange (BIST 100)." IEEE Access, vol.8, pp. 188186-188198, 2020.
    [22] Mrityunjay, Amit Kumar Jakhar, and Shivam Pandey. "Sentiment analysis on the impact of coronavirus in social life using the BERT model." Social Network Analysis and Mining, vol.11, pp.1-11, 2021.
    [23] KM, Vijayashree Karanth, Pramod Sunagar, and Anita Kanavalli. "Analysis of sentiments in political-based tweets using machine learning techniques." Proceedings of 2019 Global Conference for Advancement in Technology (GCAT) IEEE, vol.1, pp.1-5, 2019.
    [24] Kolasani, Sai Vikram, and Rida Assaf. "Predicting Stock Movement Using Sentiment Analysis of Twitter Feed with Neural Networks." Journal of Data Analysis and Information Processing, vol.8, pp.309-319, 2020.
    [25] Dang, Nhan Cach, María N. Moreno-García, and Fernando De la Prieta. "Sentiment analysis based on deep learning: A comparative study." Electronics, vol.9, pp.1-29, 2020.
    [26] Yang Liu, Jelena Trajkovic, Hen-Geul Henry Yeh, Wenlu Zhang” Machine Learning for Predicting Stock Market Movement using News Headlines” 2020 IEEE Green Energy and Smart Systems Conference (IGESSC), vol1, pp 1-6,2020.
    [27] Akrivi Krouska, Christos Troussas, Maria Virvou” Deep Learning for Twitter Sentiment Analysis: The Effect of Pre-trained Word Embedding” Machine Learning Paradigms, vol18, pp 111–124,2020.
    [28] Fazeel Abid, Chenli, Muhammad Alam, Adnan Abid” Representation of Words Over Vectors in Recurrent Convolutional Attention Architecture for Sentiment Analysis” 2019 International Conference on Innovative Computing (ICIC), vol1, pp 1–8,2019.
    [29] Ayyub, Kashif, et al. "Exploring Diverse Features for Sentiment Quantification Using Machine Learning Algorithms." IEEE Access, vol.8, pp.142819-142831, 2020.
    [30] Jasy, Md Deloar Hossan, et al. "A Performance Evaluation of Sentiment Classification Applying SVM, KNN, and Naive Bayes." Proceedings of 2021 International Conference on Computing, Networking, Telecommunications & Engineering Sciences Applications (CoNTESA) IEEE, vol.1, pp.56-60, 2021.
    [31] Junqi Dai, Hang Yan, Tianxiang Sun, Pengfei Liu, Xipeng Qiu “Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa,” Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol1, pp1816-1829,2021.
    [32] Gupta, Amit Kumar, et al. "Prediction of COVID-19 pandemic measuring criteria using support vector machine, prophet and linear regression models in Indian scenario." Journal of Interdisciplinary Mathematics, vol.24, pp.89-108, 20201.
    [33] İsmail Kırbaş, Adnan Sözen, et al. " Comparative analysis and forecasting of COVID-19 cases in various European countries with ARIMA, NARNN and LSTM approaches." Chaos, Solitons & Fractals, vol.138, pp.1-7, 2020.
    [34] Corinna Cortes & Vladimir Vapnik. " Support-vector networks. "Machine learning, vol.20, pp.273-297, 1995.
    [35] Sepp Hochreiter, Jürgen Schmidhuber. " Long Short-Term Memory. " Neural Computatio, vol.9, pp. 1735-1780, 1997.
    [36] Google Code, Word2Vec Pretrained Model on Google Website, https://code.google.com/archive/p/word2vec/.
    [37] Github of StandfordNLP, Glove Pretrained Model on Stanford, https://github.com/stanfordnlp/GloVe.
    [38] Hugging Face, https://huggingface.co/docs/transformers/main/en/index.
    [39] Harvard Health Publishing, If you've been exposed, are sick, or are caring for someone with COVID-19, https://www.health.harvard.edu/diseases-and-conditions/if-youve-been-exposed-to-the-coronavirus.
    [40] Drugs.com, How do COVID-19 symptoms progress and what causes death? https://www.drugs.com/medical-answers/covid-19-symptoms-progress-death-3536264/.
    [41] Petersen, Kai, and Jan M. Gerken. "# Covid-19: An exploratory investigation of hashtag usage on Twitter." Health Policy, vol.125, pp.541-547,2021

    QR CODE
    :::