適用於半導體機台監控系統的容錯機制｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉學文 Hsueh-Wen Liu
論文名稱：	適用於半導體機台監控系統的容錯機制 A Fault Tolerance Mechanism for Semiconductor Equipment Monitoring
指導教授：	王尉任 Wei-Jen Wang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系在職專班 Executive Master of Computer Science & Information Engineering
論文出版年：	2016
畢業學年度：	104
語文別：	中文
論文頁數：	40
中文關鍵詞：	容錯、系統監控、半導體製造
外文關鍵詞：	SESC/GEM, Checkpointing
相關次數：	點閱：20 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著半導體製程的精進，晶圓尺吋越來越大，線寬越來越小的情況下，一片晶圓可生產更多的晶片，但是一片晶圓的成本也隨之升高。製程精進是有代價的，也就是生產過程出錯的容忍度會比過去相對的縮小，因此需要精準地控制生產過程的參數與環境的變數來減少生產瑕疵。現今的半導體生產的機台都提供一個標準化的通訊協定 SECS/GEM，可讓外部系統透過 TCP/IP 連結取得生產時的環境狀態與參數。因此，許多的半導體廠開始導入機台生產監控系統來做即時生產環境偵測，這些資料將送到製程警戒系統，當警戒系統發現異常時，就通知人去做進一步的處理。依目前的廣泛使用的半導體製程監控系統架構的作法，當監控系統出錯時，就一定需要人為介入來重新啟動系統。由於在重啟過程中將會完全收不到機台的資料，因此可能會造成製程警戒系統的誤判而發出假警報，或是有異常但並未偵測出。而這兩種情況都會影響產能，增加生產的成本。為了解決這個問題，本研究提出了支援容錯(Fault Tolerance, FT)的監控機制。我們利用Server Redundancy及 Checkpointing 機制，達成不間斷機台資料收集與回報的功能，也就是接近Zero-Downtime。本研究產出的技術可以保障監控資料之品質，進而提升準確控制半導體製程環境的能力以及半導體製程的良率。

As the semiconductor manufacturing technology advances, the size of a wafer becomes bigger and the critical dimension becomes smaller than before. This means a wafer can be used to produce more chips. However, the process of manufacturing chips is costly while using today’s semiconductor manufacturing technology. Any defect on the wafer may fail the final product and cause large business loss. To reduce the chance of defects on the wafer, the parameters of the manufacturing environment must be precisely controlled. To achieve this goal, a monitoring system is usually used to collect real-time information, which helps shorten the decision time for changing the parameters of the manufacturing environment. For now most of the semiconductor manufacturing machines support the SECS/GEM standard, which defines how to obtain the monitoring data of the machines via TCP/IP. The problem is that, the existing monitoring approach does not support failover and needs human intervention when the system crashes. This implies a long recovery time. Moreover, the failure may further cause other problems. For example, a manufacturing alarm system could generate a false alarm or overlook an important abnormality during the failure time, since the monitoring system fails to feed any data to the alarm system. To solve this problem, this thesis introduces a new fault-tolerant monitoring architecture based on the mechanisms of server redundancy and checkpointing. With the proposed architecture, the monitoring system is able to achieve a very small downtime, and consequently helps the manufacturing process and the yield rate.

摘要    i
Abstract    ii
目錄    iii
圖目錄    v
表目錄    vi
第一章 緒論    1
1-1 研究背景    1
1-2 研究目標    2
1-3 研究貢獻    2
1-4 論文架構    3
第二章 背景知識    4
2-1 SECS/GEM    4
2-2 High Availability    4
2-3 Fault Tolerance    4
2-3-1 Continuous Checkpointing    5
2-3-2 Lock-Stepping    5
2-4 Docker    5
2-5 統計製程控制 (Statistical Process Control)    6
第三章 相關研究    6
3-1 VMWare High Availability    6
3-2 VMWare Fault Tolerance    7
3-3 Dell High Availability Solutions for Hyper-V    8
3-4 相關研究比較    9
第四章 系統架構    10
4-1主要架構    10
4-2系統初始化流程    13
4-3系統運作流程    13
4-4主系統重啟服務流程    14
4-5備援系統重啟服務流程    17
第五章 實驗結果    19
5-1 實驗環境與架構    19
5-2 實驗結果與分析    21
5-2-1 系統使用Batch insertion into DB 的效能分析    21
5-2-2 系統 Queue 容量的效能    24
5-2-3系統checkpointing 機制的效能分析    25
5-2-4 與傳統的架構比較    26
第六章 結論與未來展望    28
第七章 參考文獻    29


                                

[1] S. Villareal, V. Riggins, J. Schroeder , P. Tissot and J. P. Wallace, "Interactive Semiconductor Process Overview," in Proceedings of the Frontiers in Education Conference, Atlanta, GA, vol. 1, pp. 2b4.4-2b4.8, 1995.

[2] J. Hunter, D. Delp, D. Collins and J. Si, "Understanding a Semiconductor Process Using a Full-Scale Model," IEEE Transactions on Semiconductor Manufacturing, vol. 15, no. 2, pp. 285-289, May 2002.

[3] Dja-Shin Wang, Ya-Wen Yu, Sheng-Hong Wang and Bor-Wen Cheng, "Statistical Process Control on Auto Correlated Process," in Proceedings of the 10th International Conference on Service Systems and Service Management, Hong Kong, pp. 1-84, 2013.

[4] Q. P. He and J. Wang, "Fault Detection Using the k-Nearest Neighbor Rule for Semiconductor Manufacturing Processes," IEEE Transactions on Semiconductor Manufacturing, vol. 20, no. 4, pp. 345-354, Nov. 2007.

[5] Dan Ling, Ying Zheng, Yan Wang and Chengjie Xu, "Control Performance Monitoring for EWMA-Based Run-to-Run Control in Semiconductor Manufacturing Processes," in Proceedings of the 33rd Chinese Control Conference (CCC), Nanjing, pp. 2990-2994, 2014.

[6] Zhiqiang Ge and Zhihuan Song, "Semiconductor Manufacturing Process Monitoring Based on Adaptive Substatistical PCA," IEEE Transactions on Semiconductor Manufacturing, vol. 23, no. 1, pp. 99-108, Feb. 2010.

[7] M. Namikata, K. Sato, K. Iizuka and K. Ueda, "Methods of Dynamic Scaling with VM for High Availability Server Clusters," in Proceedings of the 10th Asia-Pacific Symposium on Information and Telecommunication Technologies (APSITT), Colombo, pp. 1-3, 2015.

[8] "Docker Introduction. " [Online]. Available: https://www.docker.com/what-docker. [Accessed: 25-May-2016].

[9] F. Aderholdt, F. Han, S. L. Scott and T. Naughton, "Efficient Checkpointing of Virtual Machines Using Virtual Machine Introspection," in Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Chicago, IL, pp. 414-423, 2014.

[10] Basel Yousef, Hong Zhu and Muhammad Younas, "Tenant Level Checkpointing of Meta-Data for Multi-Tenancy SaaS," in Proceedings of the IEEE 8th International Symposium on Service Oriented System Engineering (SOSE), Oxford, pp. 148-153, 2014.
[11] "VMware vSphereTM 4 Fault Tolerance: Architecture and Performance." [Online]. Available: http://www.vmware.com/files/pdf/perf-vsphere-fault_tolerance.pdf. [Accessed: 27-Jun-2016].
[12] "SECS/GEM Introduction." [Online]. Available: http://cimlab.ie.nthu.edu.tw/course/auto/text/class_4.pdf. [Accessed: 25-May-2016].

[13] Hoi Chan and Trieu Chieu, "An Approach to High Availability for Cloud Servers with Snapshot Mechanism," in Proceedings of the Industrial Track of the 13th ACM/IFIP/USENIX International Middleware Conference, New York, NY, USA: ACM, 2012.

[14] Dmitry Duplyakin, Matthew Haney and Henry Tufo, "Highly Available Cloud-Based Cluster Management," in Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Shenzhen, pp. 1201-1204, 2015.

[15] I. P. Egwutuoha, D. Levy, B.Selic, and S. Chen, "A Survey of Fault Tolerance Mechanisms and Checkpoing/Restart Implementations for High Performance Computing Systems," in Proceedings of the Springer, vol. 65, no. 3, pp. 1302-1326, 2013.

[16] B.Cully, G.Lefebvre, D.Meyer, M.Feeley, N.Hutchinson, and A.Warfield, "Remus: High Availability via Asynchronous Virtual Machine Replication," in Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation (NSDI’08), Berkeley, CA, USA, pp. 161-174, 2008.

[17] D. J. Scales, M. Nelson, and G. Venkitachalam, "The Design and Evaluation of a Practical System for Fault-Tolerant Virtual Machines," Technical Report VMWare-RT-2010-001, VMWare, 2010.

[18] T. C. Bressoud and F. B. Schneider, "Hypervisor-Based Fault Tolerance," ACM Transactions on Computer Systems, vol. 14, no. 1, pp. 80-107, 1996.

[19] "VMware High Availability (HA)." [Online]. Available: http://www.vmware.com/files/pdf/VMware-High-Availability-DS-EN.pdf. [Accessed: 25-May-2016].

[20] "Dell™ High Availability Solutions Guide for Microsoft® Hyper-V™ R2." [Online]. Available: http://www.dell.com/downloads/global/solutions/Hyper-V_guide%20with_HA%20Cluster_0.2__10_27_11.pdf. [Accessed: 25-May-2016].

簡易檢索 / 詳目顯示

相關論文