| 研究生: |
林子晏 Zi-Yan Lin |
|---|---|
| 論文名稱: |
植基於Spark系統之分散式粒化運算決策產生演算法 A Distributed Decision Generation Algorithm based on Granular Computing Using Spark |
| 指導教授: |
王尉任
Wei-Jen Wang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 中文 |
| 論文頁數: | 53 |
| 中文關鍵詞: | 分類演算法 、分散式粒化運算決策產生演算法 |
| 外文關鍵詞: | DGAGC |
| 相關次數: | 點閱:12 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Classification演算法的特色是分成兩個階段,第一個階段是training,用已經分類的資料並根據資料的特徵做出對應的類別,第二個階段是Classification,對其他未經分類資料的特徵做分類。DGAGC是一種Classification演算法,適用於離散型資料,連續型資料需要額外處理。我們過去的研究已經讓DGAGC支援Hadoop MapReduce運算模型。但是Hadoop MapReduce的版本只針對DGAGC training的部分。在Classification部分,只有單機版本。其中以training的部分最花時間。本篇論文提出了Spark版本的DGAGC training與Classification,藉此來改善Hadoop版本在資料集運算量不算大時的執行效率。再來是DGAGC Classification的部分,單機版本在預測模型太大的時候就無法進去預測。所以提出Spark版本的DGAGC Classification改善此問題。
The DGAGC algorithm, developed by National Central University, is a classification algorithm based on association-rule mining and searching. The DGAGC algorithm also specifies a distributed computing approach for model training, which is implemented on top of Hadoop MapReduce. In this study, we propose a new distributed computing approach for the DGAGC algorithm based on Apache Spark. With the support of in-memory computing by Spark, the new distributed DGAGC algorithm can achieve less average execution time for model training, given four different training data sets. In addition, we also propose a distributed version of the DGAGC for data classification.
[1] PRIYANK PANDEY ,MANOJ KUMAR and PRAKHAR SRIVASTAVA,"Classification Techniques for Big Data:A Survey", 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp:3625-3629,2016.
[2] Min-Yi Tsai, Ping-Fang Chiang, Shao-Jui Chen, Wei-Jen Wang ,"A Decision Generation Algorithm Based on Granular Computing", 2012 IEEE International Conference on Granular Computing, pp:475-480, 2012.
[3] AMDOUNI Hamida, GAMMOUDI Mohamed Mohsen," Algorithms of Association Rules Extraction: State of the Art ",2011 IEEE 3rd International Conference on Communication Software and Networks, pp:698-703, 2011.
[4] A. Bargiela and W. Pedrycz, "The roots of Granular Computing," Proceedings of IEEE Granular Computing Conference, pp.741, 2006.
[5]Y.Y. Yao, and J.T. Yao, "Induction of Classification Rules by Granular Computing", The Seventh International Conference on Rough Sets and Current Trends in Computing, pp:331-338,2002.
[6] B.Zang,and L.Zhang,"The Quotient Space Theory of Problem Solving",Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, lecture Notes in Computer Science, Vol. 2639/2003, pp:585,2003.
[7] Apache Software Foundation,http://Hadoop.apache.org/
[8] Apache Software Foundation,https://Spark.apache.org/
[9] W. Pedrycz, "Granular Computing: an introduction," IFSA World Congress and 20th NAFIPS International Conference, pp:1349-1354, 2001.
[10] OpenStack Foundation, https://www.openstack.org/
[11] UCI Machine Learning Repository,https://archive.ics.uci.edu/ml/datasets.html
[12] Lei Gu, Huan Li,“Memory or Time: Performance Evaluation for Iterative Operation on Hadoop and Spark”2013 IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, pp:721-727,2013.