以MapReduce進行交叉驗證整合大量天文資料｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	謝佳昕 Jia-Shin Shie
論文名稱：	以MapReduce進行交叉驗證整合大量天文資料 Incorporating Astronomical Catalog by using Cross-Matching Algorithm with MapReduce
指導教授：	蔡孟峰 Meng-Feng Tsai
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2016
畢業學年度：	104
語文別：	中文
論文頁數：	65
中文關鍵詞：	大量資料、雲端運算、分散式系統、交叉驗證
外文關鍵詞：	Big Data, Cloud computing, Distributed system, Cross-Matching
相關次數：	點閱：7 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著科技的進步，在天文觀測時所使用的望遠鏡的功能也越來越強大，所觀測到的資訊更多、資料量也更大。增加了天文研究人員在進行研究時的困難，因此，本論文提出以交叉驗證(Cross-Matching)的方式，對大量的天文資料進行整合，以利於研究人員能快速的找出所需的資料。
交叉驗證(Cross-Matching)是一種常見的方法。用於大量的天文資料中找出有用的資訊。在過去以單機的方式進行交叉驗證(Cross-Matching)非常沒有效率，因此本論文以交叉驗證(Cross-Matching)為基礎，搭配OpenStack及Hadoop，建立一個分散式的雲端環境，再以分散式的演算法實作交叉驗證(Cross-Matching)，達到更有效率的對大量的天文資料進行整合。同時用分散式的檔案系統及資料庫做為儲存設備，使整個系統更具可靠性及擴充性。
本論文在實驗的部分以兩種不同的儲存方式進行設計：HDFS及HBase，比較單機版程式及分散式程式的執行速率和在相同節點數，實體電腦的運算時間及在雲端環境上的虛擬節點的運算時間；在不同節點數其運算時間之比較；不同儲存方式的運算時間。並提供一個視覺化的使用者介面，可以快速的找出需要的資料。

Cross-Matching is a common way for find out the useful information from different star catalogs. Today hardware is more powerful than before. The data obtained through astronomical telescopes are becoming much larger. Therefore, single machine is not able to afford handling the astronomical data. In this paper, we use OpenStack to build a cloud computing environment, Hadoop as a distributed system, HDFS and HBase as distributed storages. Implement Cross-matching with MapReduce framework. In addition, Hbase supports random access so we make an incremental mechanism. User can update new astronomical data as they want. In the experiment, Transient is my test data to compare the operation time of using single machine with distributed system and using the same number of nodes on the physical machine with virtual machine. The result shows that using virtual machine is faster than using physical machine. Furthermore, we create 12 physical nodes on cloud environment to observe the operation time of different number of node. Theoretically, when we use more nodes to run the program the speed is much faster. The fact that the speeds of 10 nodes and 12 nodes are very similar.

摘要    i
Abstract    ii
誌謝    iii
目錄    iv
圖目錄    vi
一、    緒論    1
1-1    研究背景    1
1-2    研究動機與目的    2
1-3    章節介紹    3
二、    文獻探討    4
2-1 瞬變天文事件(Transient astronomical event)    4
2-2 OpenStack    5
2-3 Hadoop    7
2-4 MapReduce    8
2-5 NoSQL    8
三、    系統架構    10
3-1 雲端運算平台    10
3-2 HDFS檔案系統    11
3-3 HBase資料庫    12
3-4交叉驗證（Cross-Matching）    14
3-5 Clustering Stage    16
3-6 Cross Matching Stage    16
四、    研究方法    18
4-1 Clustering Stage    18
4-1-1 資料簡化、分群    18
4-2    Cross Matching Stage    20
4-2-1 交叉驗證    21
4-3    新增觀測資料    23
4-4    視覺化查詢介面    24
五、    實驗    27
5-1 Clustering Stage執行時間    29
5-1-1 Clustering Stage基於HDFS    29
5-1-2 Clustering Stage基於HBase    32
5-1-3 Clustering Stage於HDFS與HBase比較    35
5-2 Cross Matching Stage執行時間    38
5-2-1 Cross Matching Stage基於HDFS    39
5-2-2 Cross Matching Stage基於HBase    41
5-2-3 Cross Matching Stage之HDFS與HBase比較    44
5-3 新增觀測資料    48
六、    結論    50
參考文獻    52


                                

[1] Pastorello, A., Smartt, S. J., Botticella, M. T. (Including Urata, Y.), Ultra-bright Optical Transients are Linked with Type Ic Supernovae ,The Astrophysical Journal, v. 724, pp. L16, (2010)
[2] Palomar Transient Factory, http://www.ptf.caltech.edu/
[3] Pan-Stars Project, http://pan-starrs.ifa.hawaii.edu/public/
[4] OpenStack, https://www.openstack.org/
[5] Hadoop, http://hadoop.apache.org/
[6] Sachin Puttur: Big Data: Overview of apache Hadoop, http://www.sachinpbuzz.com/2014/01/big-data-overview-of-apache-hadoop.html
[7] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, OSDI'04: Sixth Symposium on Operating System Design and Implementation,San Francisco, CA, December, 2004.
[8] The Truth About MapReduce Performance on SSDs,
http://blog.cloudera.com/blog/2014/03/the-truth-about-mapreduce-performance-on-ssds
[9] J. Bhogal, I. Choksi, “Handling Big Data using NoSQL”, Advanced Information Networking and Applications Workshops (WAINA), pp. 393-398, 2015.
[10]HDFS, https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
[11]HBase, https://hbase.apache.org/
[12] The Khangaonkar Report, http://khangaonkar.blogspot.tw/2013/04/using-hbase-part-2-architecture.html
[13] Big data, http://hadoopbigdatas.blogspot.tw/2013/03/hbase-architecture.html
[14] M. A. Nieto-Santisteban, A. R. Thakar, and A. S. Szalay. Cross-matching very large datasets. In NSTC NASA Conference,2007
[15] VizieR, http://vizier.u-strasbg.fr
[16] Simbad, http://simbad.u-strasbg.fr
[17] Qing Zhao, Jizhou Sun, Ce Yu, Chenzhou Cui,Liqiang Lv, and Jian Xiao. A Paralleled Large-Scale Astronomical Cross-Matching Function
[18] Transient astronomical event, https://en.wikipedia.org/wiki/Transient_astronomical_event
[19] 山東大學張夏旭, The Design and Implementation of Multi-stars Storage and Cross match Based on Hadoop.
[20] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters
[21] María A. Nieto-Santisteban, Aniruddha R. Thakar, and Alexander S. Szalay. Cross-Matching Very Large Datasets
[22] Qing Zhao, Jizhou Sun, Ce Yu, Chenzhou Cui,Liqiang Lv, and Jian Xiao. A Paralleled Large-Scale Astronomical Cross-Matching Function
[23] S.Sathya, Prof. M.Victor Jose. Application of Hadoop MapReduce Technique to
Virtual Database System Design
[24] Cuncang Mi, Qian Chen, Taoying Liu. An Efficient Cross-Match Implementation based on Directed Join Algorithm in MapReduce
[25] Hot Spot, http://hbase.apache.org/0.94/book/casestudies.perftroub.html

簡易檢索 / 詳目顯示

相關論文