跳到主要內容

簡易檢索 / 詳目顯示

研究生: 林子晏
Zi-Yan Lin
論文名稱: 植基於Spark系統之分散式粒化運算決策產生演算法
A Distributed Decision Generation Algorithm based on Granular Computing Using Spark
指導教授: 王尉任
Wei-Jen Wang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 53
中文關鍵詞: 分類演算法分散式粒化運算決策產生演算法
外文關鍵詞: DGAGC
相關次數: 點閱:13下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Classification演算法的特色是分成兩個階段,第一個階段是training,用已經分類的資料並根據資料的特徵做出對應的類別,第二個階段是Classification,對其他未經分類資料的特徵做分類。DGAGC是一種Classification演算法,適用於離散型資料,連續型資料需要額外處理。我們過去的研究已經讓DGAGC支援Hadoop MapReduce運算模型。但是Hadoop MapReduce的版本只針對DGAGC training的部分。在Classification部分,只有單機版本。其中以training的部分最花時間。本篇論文提出了Spark版本的DGAGC training與Classification,藉此來改善Hadoop版本在資料集運算量不算大時的執行效率。再來是DGAGC Classification的部分,單機版本在預測模型太大的時候就無法進去預測。所以提出Spark版本的DGAGC Classification改善此問題。


    The DGAGC algorithm, developed by National Central University, is a classification algorithm based on association-rule mining and searching. The DGAGC algorithm also specifies a distributed computing approach for model training, which is implemented on top of Hadoop MapReduce. In this study, we propose a new distributed computing approach for the DGAGC algorithm based on Apache Spark. With the support of in-memory computing by Spark, the new distributed DGAGC algorithm can achieve less average execution time for model training, given four different training data sets. In addition, we also propose a distributed version of the DGAGC for data classification.

    第一章緒論 1 1.1問題定義 3 1.2研究目標與預期貢獻 3 1.3論文結構 5 第二章背景與相關研究 6 2.1 Association Rule 6 2.2 Granular Computing 8 2.3 DGAGC 10 2.4 Spark 21 第三章系統架構 25 3.1 DGAGC training和最佳化 25 3.2 DGAGC Classification 30 第四章實驗結果 34 第五章結論及未來研究方向 43 參考文獻 45

    [1] PRIYANK PANDEY ,MANOJ KUMAR and PRAKHAR SRIVASTAVA,"Classification Techniques for Big Data:A Survey", 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp:3625-3629,2016.
    [2] Min-Yi Tsai, Ping-Fang Chiang, Shao-Jui Chen, Wei-Jen Wang ,"A Decision Generation Algorithm Based on Granular Computing", 2012 IEEE International Conference on Granular Computing, pp:475-480, 2012.
    [3] AMDOUNI Hamida, GAMMOUDI Mohamed Mohsen," Algorithms of Association Rules Extraction: State of the Art ",2011 IEEE 3rd International Conference on Communication Software and Networks, pp:698-703, 2011.
    [4] A. Bargiela and W. Pedrycz, "The roots of Granular Computing," Proceedings of IEEE Granular Computing Conference, pp.741, 2006.
    [5]Y.Y. Yao, and J.T. Yao, "Induction of Classification Rules by Granular Computing", The Seventh International Conference on Rough Sets and Current Trends in Computing, pp:331-338,2002.
    [6] B.Zang,and L.Zhang,"The Quotient Space Theory of Problem Solving",Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, lecture Notes in Computer Science, Vol. 2639/2003, pp:585,2003.
    [7] Apache Software Foundation,http://Hadoop.apache.org/
    [8] Apache Software Foundation,https://Spark.apache.org/
    [9] W. Pedrycz, "Granular Computing: an introduction," IFSA World Congress and 20th NAFIPS International Conference, pp:1349-1354, 2001.
    [10] OpenStack Foundation, https://www.openstack.org/
    [11] UCI Machine Learning Repository,https://archive.ics.uci.edu/ml/datasets.html
    [12] Lei Gu, Huan Li,“Memory or Time: Performance Evaluation for Iterative Operation on Hadoop and Spark”2013 IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, pp:721-727,2013.

    QR CODE
    :::