植基於Spark系統之分散式粒化運算決策產生演算法

簡易檢索 / 詳目顯示

回結果列表

研究生：	林子晏 Zi-Yan Lin
論文名稱：	植基於Spark系統之分散式粒化運算決策產生演算法 A Distributed Decision Generation Algorithm based on Granular Computing Using Spark
指導教授：	王尉任 Wei-Jen Wang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2017
畢業學年度：	105
語文別：	中文
論文頁數：	53
中文關鍵詞：	分類演算法、分散式粒化運算決策產生演算法
外文關鍵詞：	DGAGC
相關次數：	點閱：12 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

Classification演算法的特色是分成兩個階段，第一個階段是training，用已經分類的資料並根據資料的特徵做出對應的類別，第二個階段是Classification，對其他未經分類資料的特徵做分類。DGAGC是一種Classification演算法，適用於離散型資料，連續型資料需要額外處理。我們過去的研究已經讓DGAGC支援Hadoop MapReduce運算模型。但是Hadoop MapReduce的版本只針對DGAGC training的部分。在Classification部分，只有單機版本。其中以training的部分最花時間。本篇論文提出了Spark版本的DGAGC training與Classification，藉此來改善Hadoop版本在資料集運算量不算大時的執行效率。再來是DGAGC Classification的部分，單機版本在預測模型太大的時候就無法進去預測。所以提出Spark版本的DGAGC Classification改善此問題。

The DGAGC algorithm, developed by National Central University, is a classification algorithm based on association-rule mining and searching. The DGAGC algorithm also specifies a distributed computing approach for model training, which is implemented on top of Hadoop MapReduce. In this study, we propose a new distributed computing approach for the DGAGC algorithm based on Apache Spark. With the support of in-memory computing by Spark, the new distributed DGAGC algorithm can achieve less average execution time for model training, given four different training data sets. In addition, we also propose a distributed version of the DGAGC for data classification.

第一章緒論  1
1.1問題定義  3
1.2研究目標與預期貢獻  3
1.3論文結構  5
第二章背景與相關研究  6
2.1 Association Rule  6
2.2 Granular Computing  8
2.3 DGAGC  10
2.4 Spark  21
第三章系統架構  25
3.1 DGAGC training和最佳化  25
3.2 DGAGC Classification  30
第四章實驗結果  34
第五章結論及未來研究方向  43
參考文獻  45
                                

[1] PRIYANK PANDEY ,MANOJ KUMAR and PRAKHAR SRIVASTAVA,"Classification Techniques for Big Data:A Survey", 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp:3625-3629,2016.
[2] Min-Yi Tsai, Ping-Fang Chiang, Shao-Jui Chen, Wei-Jen Wang ,"A Decision Generation Algorithm Based on Granular Computing", 2012 IEEE International Conference on Granular Computing, pp:475-480, 2012.
[3] AMDOUNI Hamida, GAMMOUDI Mohamed Mohsen," Algorithms of Association Rules Extraction: State of the Art ",2011 IEEE 3rd International Conference on Communication Software and Networks, pp:698-703, 2011.
[4] A. Bargiela and W. Pedrycz, "The roots of Granular Computing," Proceedings of IEEE Granular Computing Conference, pp.741, 2006.
[5]Y.Y. Yao, and J.T. Yao, "Induction of Classification Rules by Granular Computing", The Seventh International Conference on Rough Sets and Current Trends in Computing, pp:331-338,2002.
[6] B.Zang,and L.Zhang,"The Quotient Space Theory of Problem Solving",Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, lecture Notes in Computer Science, Vol. 2639/2003, pp:585,2003.
[7] Apache Software Foundation,http://Hadoop.apache.org/
[8] Apache Software Foundation,https://Spark.apache.org/
[9] W. Pedrycz, "Granular Computing: an introduction," IFSA World Congress and 20th NAFIPS International Conference, pp:1349-1354, 2001.
[10] OpenStack Foundation, https://www.openstack.org/
[11] UCI Machine Learning Repository,https://archive.ics.uci.edu/ml/datasets.html
[12] Lei Gu, Huan Li,“Memory or Time: Performance Evaluation for Iterative Operation on Hadoop and Spark”2013 IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, pp:721-727,2013.

簡易檢索 / 詳目顯示

相關論文