跳到主要內容

簡易檢索 / 詳目顯示

研究生: 施麗雅
Isariya Sirivejabandhu
論文名稱: A Graph-based Approach for PM2.5 Prediction
指導教授: 孫敏德
Min-Te Sun
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 70
中文關鍵詞: PM2.5 預測數據融合圖模型空氣品質深度學習空間時間特徵
外文關鍵詞: PM2.5 Prediction, Data Fusion, Graph-based model, Air Quality, Deep Learning, Spatio-Temporal Feature
相關次數: 點閱:12下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 空氣污染是一個對人類健康和環境造成重大風險的全球問題。 PM2.5 是呼吸道和心血管疾病的主要因素。準確的 PM2.5 預測對於了解污染模式、保護公眾健康、環境規劃和政策制定至關重要。在這項研究中,我們進行了多次實驗,以改進 PM2.5 的預測能力,並開發了一個全面的 PM2.5 預測系統,其中包括 AirBox 數據預處理、EPA 數據預處理、數據融合、特徵工程、特徵選擇和提出的預測模型 DCRNN-GS。我們提出的模型專為迭代多步驟的PM2.5預測而設計,使用過去 24 小時的數據來預測未來 24 小時的情況。結果顯示,我們提出的系統在 PM2.5 預測方面優於最先進的方法。


    Air pollution is a global issue with significant risks to human health and the environment. PM2.5 is a major contributor to respiratory and cardiovascular diseases. Accurate PM2.5 prediction is crucial for understanding pollution patterns, protecting public health, environmental planning, and policy development. In this research, we conducted several experiments to improve PM2.5 prediction and developed a comprehensive PM2.5 prediction system that includes AirBox data preprocessing, EPA data preprocessing, data fusion, feature engineering, feature selection, and the proposed prediction model, DCRNN-GS. Our proposed model is specifically designed for iterative multi-step PM2.5 prediction, using the data from the past 24 hours to predict the next 24 hours. The results show that our proposed system outperforms the state-of-the-art approaches in PM2.5 prediction.

    Contents 1 Introduction 1 2 Related Work 4 2.1 Machine Learning PM2.5 Prediction . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Deep Learning PM2.5 Prediction . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.1 Feedforward Neural Network based . . . . . . . . . . . . . . . . . . 5 2.2.2 Recurrent Neural Network (RNN) based . . . . . . . . . . . . . . . 6 2.2.3 Convolutional Neural Networks (CNN) based . . . . . . . . . . . . 7 2.2.4 Graph Neural Networks (GNN) based . . . . . . . . . . . . . . . . . 7 2.3 Hybrid Model PM2.5 Prediction . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Preliminary 10 3.1 HyperImpute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 DeepKriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 GraphSAGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.4 Diffusion Convolutional Recurrent Neural Network . . . . . . . . . . . . . 13 3.4.1 Bidirectional Random Walk . . . . . . . . . . . . . . . . . . . . . . 14 3.4.2 Diffusion Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4.3 Diffusion Convolutional Layer . . . . . . . . . . . . . . . . . . . . . 15 3.4.4 Diffusion Convolutional Gated Recurrent Unit . . . . . . . . . . . . 16 3.4.5 Scheduled Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4.6 Sequence to Sequence Architecture . . . . . . . . . . . . . . . . . . 16 4 Design 18 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.4 Proposed System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.4.1 EPA Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4.2 AirBox Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . 25 4.4.3 Data Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.4.4 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.4.5 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.4.6 Diffusion Convolutional Recurrent Neural Network with Graph- SAGE (DCRNN-GS) . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5 Performance 35 5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.4 Experimental Results and Analysis . . . . . . . . . . . . . . . . . . . . . . 40 5.4.1 Data Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.4.2 Data Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.4.3 Window Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.4.4 PM2.5 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.5 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.5.1 Model Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.5.2 Activation-Optimization Combination . . . . . . . . . . . . . . . . . 45 5.5.3 Impact of Additional Features on Model Accuracy . . . . . . . . . . 46 6 Conclusion 48

    References
    [1] Abien Fred Agarap. Deep learning using rectified linear units (relu). ArXiv,
    abs/1803.08375, 2018.
    [2] James Atwood and Don Towsley. Diffusion-convolutional neural networks. In Proceed-
    ings of the 30th International Conference on Neural Information Processing Systems,
    NIPS’16, page 2001–2009, Red Hook, NY, USA, 2016. Curran Associates Inc.
    [3] Lu Bai, Jianzhou Wang, Xuejiao Ma, and Haiyan Lu. Air pollution forecasts: An
    overview. International Journal of Environmental Research and Public Health, 15(4),
    2018.
    [4] Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sam-
    pling for sequence prediction with recurrent neural networks. In Proceedings of the
    28th International Conference on Neural Information Processing Systems - Volume
    1, NIPS’15, page 1171–1179, Cambridge, MA, USA, 2015. MIT Press.
    [5] Jon Louis Bentley. Multidimensional binary search trees used for associative search-
    ing. In Communications of the ACM, pages 509–517. ACM, 1975.
    [6] Ho Chang-Hoi, Ingyu Park, Hye-Ryun Oh, Hyeon-Ju Gim, Sun-Kyong Hur, Jinwon
    Kim, and Dae-Ryun Choi. Development of a pm2.5 prediction model using a re-
    current neural network algorithm for the seoul metropolitan area, republic of korea.
    Atmospheric Environment, 245:118021, 2021.
    [7] Mei-Hsin Chen, Yao-Chung Chen, Tien-Yin Chou, and Fang-Shii Ning. Pm2.5 con-
    centration prediction model: A cnn-rf ensemble framework. International Journal of
    Environmental Research and Public Health, 20, 02 2023.
    [8] Wanfang Chen, Yuxiao Li, Brian J Reich, and Ying Sun. Deepkriging: Spatially
    dependent deep neural networks for spatial prediction, 2022.
    [9] Junyoung Chung, C ̧ aglar G ̈ul ̧cehre, KyungHyun Cho, and Yoshua Bengio. Empir-
    ical evaluation of gated recurrent neural networks on sequence modeling. CoRR,
    abs/1412.3555, 2014.
    [10] Noel Cressie. The origins of kriging. Mathematical Geology, 22:239–252, 1990.
    [11] Wenjie Du, David Cote, and Yan Liu. SAITS: Self-Attention-based Imputation for
    Time Series. Expert Systems with Applications, 219:119619, 2023.
    [12] Edimax. The home page of edigreen airbox. https://airbox.edimaxcloud.com/,
    2020. Accessed: Dec. 1, 2020.
    [13] Environmental Protection Administration, Executive Yuan. EPA Air Quality Mon-
    itoring Network. https://data.epa.gov.tw/en/dataset/detail/AQX_P_488, Ac-
    cessed 2023. Accessed on June 20, 2023.
    [14] Valentin Flunkert, David Salinas, and Jan Gasthaus. Deepar: Probabilistic forecast-
    ing with autoregressive recurrent networks. CoRR, abs/1704.04110, 2017.
    [15] Lovedeep Gondara and Ke Wang. MIDA: Multiple Imputation Using Denoising Au-
    toencoders, pages 260–272. 06 2018.
    [16] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,
    Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks,
    2014.
    [17] Bin Guo, Dingming Zhang, Lin Pei, Yi Su, Xiaoxia Wang, Yi Bian, Donghai Zhang,
    Wanqiang Yao, Zixiang Zhou, and Liyu Guo. Estimating pm2.5 concentrations via
    random forest method using satellite, auxiliary, and ground-level station dataset at
    multiple temporal scales across china in 2017. Science of The Total Environment,
    778:146288, 2021.
    [18] William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learn-
    ing on large graphs. In Proceedings of the 31st International Conference on Neural
    Information Processing Systems, NIPS’17, page 1025–1035, Red Hook, NY, USA,
    2017. Curran Associates Inc.
    [19] Chiou-Jye Huang and Ping-Huan Kuo. A deep cnn-lstm model for particulate matter
    (pm2.5) forecasting in smart cities. Sensors, 18(7), 2018.
    [20] Keyong Huang, Qingyang Xiao, Xia Meng, Guannan Geng, Yujie Wang, Alexei Lya-
    pustin, Dongfeng Gu, and Yang Liu. Predicting monthly high-resolution pm2.5 con-
    centrations with random forest model in the north china plain. Environmental Pol-
    lution, 242:675–683, 2018.
    [21] Daniel Jarrett, Bogdan Cebere, Tennison Liu, Alicia Curth, and Mihaela van der
    Schaar. Hyperimpute: Generalized iterative imputation with automatic model selec-
    tion. 2022.
    [22] Feng Jiang, Xingyu Han, Wenya Zhang, and Guici Chen. Atmospheric pm2.5 pre-
    diction using deepar optimized by sparrow search algorithm with opposition-based
    and fitness-based learning. Atmosphere, 12:894, 07 2021.
    [23] Chen Jinyin, Xueke Wang, and Xuanheng Xu. Gc-lstm: graph convolution embedded
    lstm for dynamic network link prediction. Applied Intelligence, 52, 05 2022.
    [24] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization,
    2017.
    [25] Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convo-
    lutional networks. CoRR, abs/1609.02907, 2016.
    [26] G ̈unter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. Self-
    normalizing neural networks. In Proceedings of the 31st International Conference
    on Neural Information Processing Systems, NIPS’17, page 972–981, Red Hook, NY,
    USA, 2017. Curran Associates Inc.
    [27] Endah Kristiani, Hao Lin, Jwu-Rong Lin, Yen-Hsun Chuang, Chin-Yin Huang, and
    Chao-Tung Yang. Short-term prediction of pm2.5 using lstm deep learning methods.
    Sustainability, 14(4), 2022.
    [28] Van-Duc Le, Tien-Cuong Bui, and Sang-Kyun Cha. Spatiotemporal deep learning
    model for citywide air pollution interpolation and prediction. In 2020 IEEE Inter-
    national Conference on Big Data and Smart Computing (BigComp), pages 55–62,
    2020.
    [29] Mike Lee, Larry Lin, Chih-Yuan Chen, Yu Tsao, Ting-Hsuan Yao, Min-Han Fei,
    and Shih-Hau Fang. Forecasting air quality in taiwan by using machine learning.
    Scientific Reports, 10, 03 2020.
    [30] Xinfang Li and Hua Huo. Prediction of pm2.5 concentration based on cnn-bigru
    model. Academic Journal of Science and Technology, 5(3):1–8, Apr. 2023.
    [31] Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recur-
    rent neural network: Data-driven traffic forecasting. In International Conference on
    Learning Representations (ICLR ’18), 2018.
    [32] Yuxuan Liang, Songyu Ke, Junbo Zhang, Xiuwen Yi, and Yu Zheng. Geoman: Multi-
    level attention networks for geo-sensory time series prediction. In Proceedings of the
    Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18,
    pages 3428–3434. International Joint Conferences on Artificial Intelligence Organiza-
    tion, 7 2018.
    [33] Yijun Lin, Nikhit Mago, Yu Gao, Yaguang Li, Yao-Yi Chiang, Cyrus Shahabi, and
    Jos ́e Luis Ambite. Exploiting spatiotemporal patterns for accurate air quality fore-
    casting using deep learning. SIGSPATIAL ’18, page 359–368, New York, NY, USA,
    2018. Association for Computing Machinery.
    [34] Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng
    Gao, and Jiawei Han. On the variance of the adaptive learning rate and beyond.
    In Proceedings of the Eighth International Conference on Learning Representations
    (ICLR 2020), April 2020.
    [35] Chia-Yu Lo, Wen-Hsing Huang, Ming-Feng Ho, Min-Te Sun, Ling-Jyh Chen, Kazuya
    Sakai, and Wei-Shinn Ku. Recurrent learning on pm2.5 prediction based on clus-
    tered airbox dataset. IEEE Transactions on Knowledge and Data Engineering,
    34(10):4994–5008, 2022.
    [36] Yi-Ju Lu and Cheng-Te Li. Agstn: Learning attention-adjusted graph spatio-
    temporal networks for short-term urban sensor value forecasting, 2021.
    [37] Zhipeng Luo, Jianqiang Huang, Ke Hu, Xue Li, and Peng Zhang. Accuair: Winning
    solution to air quality prediction for kdd cup 2018. In Proceedings of the 25th ACM
    SIGKDD International Conference on Knowledge Discovery Data Mining, KDD ’19,
    page 1842–1850, New York, NY, USA, 2019. Association for Computing Machinery.
    [38] Sachit Mahajan, Hao Min Liu, Tzu-Chieh Tsai, and Ling-Jyh Chen. Improving the
    accuracy and efficiency of pm2.5 forecast service using cluster-based hybrid neural
    network model. IEEE Access, PP:1–1, 03 2018.
    [39] Ministry of Economic Affairs, R.O.C. (Taiwan). Air Quality Monitoring
    Data. https://ci.taiwan.gov.tw/dsp/Views/_EN/dataset/detail.aspx?id=
    air_2, Accessed 2023. Accessed on June 20, 2023.
    [40] World Health Organization. Ambient (outdoor) air quality and health. Accessed:
    Dec. 1, 2020, 2018.
    [41] Yu Pengfei, Juanjuan He, Liu Xiaoming, and Zhang Kai. Industrial Air Pollu-
    tion Prediction Using Deep Neural Network: 13th International Conference, BIC-TA
    2018, Beijing, China, November 2–4, 2018, Proceedings, Part I, pages 173–185. 01
    2018.
    [42] Ling Qing. Pm2.5 concentration prediction using gra-gru network in air monitoring.
    Sustainability, 15:1973, 01 2023.
    [43] Ulrich Ranft, Tamara Schikowski, Dorothee Sugiri, Jean Krutmann, and Ursula
    Kr ̈amer. Long-term exposure to traffic-related particulate matter impairs cognitive
    function in the elderly. Environmental Research, 109(8):1004–1011, 2009.
    [44] Benedek Rozemberczki, Paul Scherer, Yixuan He, George Panagopoulos, Alexander
    Riedel, Maria Astefanoaei, Oliver Kiss, Ferenc Beres, Guzman Lopez, Nicolas Col-
    lignon, and Rik Sarkar. PyTorch Geometric Temporal: Spatiotemporal Signal Pro-
    cessing with Neural Machine Learning Models. In Proceedings of the 30th ACM Inter-
    national Conference on Information and Knowledge Management, page 4564–4573,
    2021.
    [45] K.Krishna Rani Samal, Korra Sathya Babu, and Santos Kumar Das. A neural net-
    work approach with iterative strategy for long-term pm2.5 forecasting. In 2021 IEEE
    18th India Council International Conference (INDICON), pages 1–6, 2021.
    [46] Mihaela Van der Schaar and et al. Hyperimpute: An imputation framework for mul-
    timodal biomedical data. https://github.com/vanderschaarlab/hyperimpute,
    2021. GitHub repository.
    [47] Ting Shi, Wu Yang, and Junfei Qiao. Urban PM2.5 prediction based on temporal
    convolution network. In Zhiyong Zhang, editor, 2021 International Conference on
    Neural Networks, Information and Communication Engineering, volume 11933 of
    Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, page
    119330G, October 2021.
    [48] Daniel J. Stekhoven and Peter B ̈uhlmann. MissForest—non-parametric missing value
    imputation for mixed-type data. Bioinformatics, 28(1):112–118, 10 2011.
    [49] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with
    neural networks. In Proceedings of the 27th International Conference on Neural
    Information Processing Systems - Volume 2, NIPS’14, page 3104–3112, Cambridge,
    MA, USA, 2014. MIT Press.
    [50] Taiwan Environmental Protection Administration. Taiwan Environmental Protection
    Administration.
    [51] Jia-Hong Tang, Shih-Chun Candice Lung, and Jing-Shiang Hwang. Source appor-
    tionment of pm2.5 concentrations with a bayesian hierarchical model on latent source
    profiles. Atmospheric Pollution Research, 11(10):1715–1727, 2020.
    [52] Chunyang Wang, Yanmin Zhu, Tianzi Zang, Haobing Liu, and Jiadi Yu. Modeling
    inter-station relationships with attentive temporal graph convolutional network for
    air quality prediction. In Proceedings of the 14th ACM International Conference on
    Web Search and Data Mining, WSDM ’21, page 616–634, New York, NY, USA, 2021.
    Association for Computing Machinery.
    [53] Haixu Wang and Shiyong Shao. Prediction of pm2.5 in hefei based on a hybrid cnn-
    gru model. In 2022 5th International Conference on Data Science and Information
    Technology (DSIT), pages 1–6, 2022.
    [54] Shuo Wang, Yanran Li, Jiang Zhang, Qingye Meng, Lingwei Meng, and Fei Gao.
    Pm2.5-gnn: A domain knowledge enhanced graph neural network for pm2.5 forecast-
    ing. In Proceedings of the 28th International Conference on Advances in Geographic
    Information Systems, SIGSPATIAL ’20, page 163–166, New York, NY, USA, 2020.
    Association for Computing Machinery.
    [55] Ian R White, Patrick Royston, and Angela M Wood. Multiple imputation us-
    ing chained equations: Issues and guidance for practice. Statistics in medicine,
    30(4):377—399, February 2011.
    [56] Fei Xiao, Mei Yang, Hong Fan, Guan Fan, and Mohammed Abdulaziz Aide Al-
    qaness. An improved deep learning model for predicting daily pm2.5 concentration.
    Scientific Reports, 10, 2020.
    [57] Zhe Xu and Lv Yi. Att-ConvLSTM: PM2.5 Prediction Model and Application, pages
    30–40. 01 2020.
    [58] Wentao Yang, Min Deng, Feng Xu, and Hang Wang. Prediction of hourly pm2.5 using
    a space-time support vector regression model. Atmospheric Environment, 181:12–19,
    2018.
    [59] Xiuwen Yi, Junbo Zhang, Zhaoyuan Wang, Tianrui Li, and Yu Zheng. Deep dis-
    tributed fusion network for air quality prediction. In Proceedings of the 24th ACM
    SIGKDD International Conference on Knowledge Discovery Data Mining, KDD ’18,
    page 965–973, New York, NY, USA, 2018. Association for Computing Machinery.
    [60] Peng-Yeng Yin, Alex Yaning Yen, Shou-En Chao, Rong-Fuh Day, and Bir Bhanu. A
    machine learning-based ensemble framework for forecasting pm2.5 concentrations in
    puli, taiwan. Applied Sciences, 12(5), 2022.
    [61] Jinsung Yoon, James Jordon, and Mihaela van der Schaar. GAIN: missing data
    imputation using generative adversarial nets. CoRR, abs/1806.02920, 2018.
    [62] Mehdi Zamani Joharestani, Chunxiang Cao, Xiliang Ni, Barjeece Bashir, and So-
    mayeh Talebiesfandarani. Pm2.5 prediction based on random forest, xgboost, and
    deep learning using multisource remote sensing data. Atmosphere, 10(7), 2019.
    [63] Guangyuan Zhang, Xiaoping Rui, and Yonglei Fan. Critical review of methods to
    estimate pm2.5 concentrations within specified research region. ISPRS International
    Journal of Geo-Information, 7(9), 2018.
    [64] Jinsong Zhang, Yongtao Peng, Bo Ren, and Taoying Li. Pm2.5 concentration pre-
    diction based on cnn-bilstm and attention mechanism. Algorithms, 14(7), 2021.
    [65] Yawen Zhang, Qin Lv, Duanfeng Gao, Si Shen, Robert Dick, Michael Hannigan, and
    Qi Liu. Multi-group encoder-decoder networks to fuse heterogeneous data for next-
    day air quality prediction. In Proceedings of the 28th International Joint Conference
    on Artificial Intelligence, IJCAI’19, page 4341–4347. AAAI Press, 2019.
    [66] Ling Zhao, Yujiao Song, Chao Zhang, Yu Liu, Pu Wang, Tao Lin, Min Deng, and
    Haifeng Li. T-gcn: A temporal graph convolutional network for traffic prediction.
    IEEE Transactions on Intelligent Transportation Systems, 21(9):3848–3858, 2020.

    QR CODE
    :::