| 研究生: |
高健賓 Jian-Bin Kao |
|---|---|
| 論文名稱: |
基於統計與深度學習之單變數時間序列異常檢測 Anomaly Detection for Univariate Time-Series with Statistics and Deep Learning |
| 指導教授: |
江振瑞
Jehn-Ruey Jiang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 56 |
| 中文關鍵詞: | 物聯網 、統計分析 、異常偵測 、迪基-福勒檢驗 、快速傅立葉轉換 、皮爾森積矩相關係數 、閘遞迴單元神經網路 、深度學習 、單變數時間序列 |
| 外文關鍵詞: | big data analysis, Dickey-Fuller test, Pearson product-moment correlation coefficient, GRU neural network |
| 相關次數: | 點閱:15 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年由於物聯網(Internet of Things, IoT)技術的迅速發展,各式各樣遍佈在我們生活周遭的感測器不斷累積巨量的時間序列(time series)資料(簡稱時序資料),因此,對於時序資料的分析需求快速增加,而異常檢測(anomaly detection)是各種需求中的重要項目之一。本篇論文提出單變數時序資料之異常檢測框架,先依照時序資料的特徵,透過迪基-福勒檢驗、快速傅立葉轉換以及皮爾森積矩相關係數將時序資料分為三類: (1)平穩時序資料、(2)週期性時序資料與(3)非平穩且非週期時序資料;然後再針對不同類型的時序資料使用基於統計以及深度學習的不同方法進行異常檢測。
在平穩時序資料方面,我們利用一個較大及一個較小的滑動窗口之平均值計算其變化率,並設定變化率閥值來即時偵測異常;在週期性時序資料方面,我們計算當前週期與前一週期之時間視窗內資料的標準差比值,並設定閥值來偵測異常;最後在非平穩且非週期時序資料方面,我們則使用閘遞迴單元(gated recurrent unit, GRU)神經網路模型針對時序資料進行預測,並以預測誤差透過常態分佈的累積密度函數進行異常偵測。
我們以美國Numenta公司在其開發的Nupic平台上公開的四個真實資料集以及一個人工資料集作為實驗數據,並與ADSaS、STL、SARIMA、LSTM、LSTM with STL等相關方法進行比較,實驗比較結果顯示,本論文所提的異常檢測框架具有最佳的F1-score。
A wide variety of time series data have recently been accumulated from sensors around our daily lives, due to the rapid development of the Internet of Things (IoT) technology. As a result, demands for analyzing time series data are rapidly increasing, and anomaly detection is one of the important tasks in various demands. This paper proposes an anomaly detection framework for univariate time series data. First, the time series data are divided into three categories according to the data characteristics. The three categories of data are (1) stationary time series data, (2) periodic time series data, and (3) non-stationary and non-periodic time series data based on the Dickey-Fuller test, fast Fourier transform (FFT), and Pearson product-moment correlation coefficient. Different schemes using statistics and deep learning concepts are then applied to different categories of data for performing anomaly detection.
For stationary time series data, the ratio of the means of a large sliding time window and a small window is calculated. An anomaly is assumed to occur, if the ratio exceeds a threshold value. For periodic time series data, the period of the data is first derived. Afterwards, the standard deviation ratio of data in two consecutive periods is calculated. It is assumed that an anomaly occurs if the ratio exceeds a threshold value. For non-stationary and non-periodic time series data, the neural network of the gated recurrent unit (GRU) model is applied for predicting time series data value. The anomaly is detected on the basis of the cumulative density function of the normal distribution over prediction error.
Four open real-word datasets and an artificial dataset released on Nupic platform maintained by Numenta corporation are used for performance evaluation of the proposed framework. The evaluation results are compared with those of related methods, namely the ADSaS, STL, SARIMA, LSTM, and LSTM with STL methods. The comparisons show that the proposed framework has the best F1 score for anomaly detection.
[1] The Numenta Anomaly Benchmark, https://github.com/numenta/NAB
[2] S. Lee, H. K. Kim (November 2018). ADSaS: Comprehensive Real-time Anomaly Detection System. arXiv preprint arXiv:1811.12634v1
[3] V. Chandola, V. Mithal, V. Kumar (2008). Comparative evaluation of anomaly detection techniques for sequence data.Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pp. 743-748, doi:10.1109/ICDM.2008.151.
[4] C. V. Loan (SIAM, 1992). Computational Frameworks for the Fast Fourier Transform. Cornell University, Ithaca, New York.
[5] W. James-Cooley, W. John-Tukey (1965). An algorithm for the machine calculation of complex Fourier series.
[6] K. Pearson (20 June 1895). Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58 : 240–242.
[7] B. Robert, S. William, I. Terpenning (1990). STL: A seasonal-trend decomposition procedure based on loess.Journal of Official Statistics 6.1.
[8] D. A. Dickey, W. A. Fuller (1979). Distribution of the Estimators for Autoregressive Time Series with a Unit Root. Journal of the American Statistical Association,74,p 427–431.
[9] Y. LeCun, D. Touresky, G. Hinton, T. Sejnowski (June 1988). A theoretical framework for back-propagation. In Proceedings of the 1988 connectionist models summer school(pp. 21-28). CMU, Pittsburgh, Pa: Morgan Kaufmann.
[10] S. Hochreiter, J. Schmidhuber (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
[11] C. Kyung-hyun, B. Fethi, S. Holger, B. Dzmitry, B. Yoshua. Learning Phrase Representations using RNN Encoder–Decoderfor Statistical Machine Translation. Association for Computational Linguistics.
[12] W. Wang, R. Battiti (2005). Identifying Intrusions in Computer Networks based on Principal Component Analysis. First International Conference on Availability, Reliability and Security,IEEE.
[13] L. Norman-Tasfi, A. Wilson-Higashino, G. Katarina, Miriam A. M. Capretz(2017).Deep Neural Networks With Confidence Sampling For Electrical Anomaly Detection. 2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData).
[14] H. Takanori, O. Jun, M. Masahiro, O. Tetsuji (2018). Tandem Connectionist Anomaly Detection Use of Faulty Vibration Signals in Feature Representation Learning. 2018 IEEE International Conference on Prognostics and Health Management (ICPHM)
[15] D. P. Kingma, J. Ba (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[16] I. Kang, M. K. Jeong, and D. Kong. A differentiated one-class classification method with applications to intrusion detection. Expert Syst. Appl., vol. 39, no. 4, pp. 3899–3905, 2012.
[17] P. Casas, J. Mazel, and P. Owezarski. Unsupervised network intrusion
detection systems: Detecting the unknown without knowledge. Comput. Commun., vol. 35, no. 7, pp. 772–783, 2012.
[18] F. Simmross-Wattenberg, J. I. Asensio-Perez, P. Casaseca-de-la-Higuera, M. Martin-Fernandez, I. A. Dimitriadis, C. Alberola-Lopez. Anomaly detection in network traffic based on statistical inference and alpha-stable modeling. IEEE Trans. Depend. Sec. Comput, vol. 8, no. 4,
pp. 494–509, 2011.
[19] C. Raghavendra, C. Sanjay (2019). DEEP LEARNING FOR ANOMALY DETECTION: A SURVEY. arXiv preprint arXiv:1901.03407