| 研究生: |
劉學文 Hsueh-Wen Liu |
|---|---|
| 論文名稱: |
適用於半導體機台監控系統的容錯機制 A Fault Tolerance Mechanism for Semiconductor Equipment Monitoring |
| 指導教授: |
王尉任
Wei-Jen Wang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系在職專班 Executive Master of Computer Science & Information Engineering |
| 論文出版年: | 2016 |
| 畢業學年度: | 104 |
| 語文別: | 中文 |
| 論文頁數: | 40 |
| 中文關鍵詞: | 容錯 、系統監控 、半導體製造 |
| 外文關鍵詞: | SESC/GEM, Checkpointing |
| 相關次數: | 點閱:20 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著半導體製程的精進,晶圓尺吋越來越大,線寬越來越小的情況下,一片晶圓可生產更多的晶片,但是一片晶圓的成本也隨之升高。製程精進是有代價的,也就是生產過程出錯的容忍度會比過去相對的縮小,因此需要精準地控制生產過程的參數與環境的變數來減少生產瑕疵。現今的半導體生產的機台都提供一個標準化的通訊協定 SECS/GEM,可讓外部系統透過 TCP/IP 連結取得生產時的環境狀態與參數。因此,許多的半導體廠開始導入機台生產監控系統來做即時生產環境偵測,這些資料將送到製程警戒系統,當警戒系統發現異常時,就通知人去做進一步的處理。依目前的廣泛使用的半導體製程監控系統架構的作法,當監控系統出錯時,就一定需要人為介入來重新啟動系統。由於在重啟過程中將會完全收不到機台的資料,因此可能會造成製程警戒系統的誤判而發出假警報,或是有異常但並未偵測出。而這兩種情況都會影響產能,增加生產的成本。為了解決這個問題,本研究提出了支援容錯(Fault Tolerance, FT)的監控機制。我們利用Server Redundancy及 Checkpointing 機制,達成不間斷機台資料收集與回報的功能,也就是接近Zero-Downtime。本研究產出的技術可以保障監控資料之品質,進而提升準確控制半導體製程環境的能力以及半導體製程的良率。
As the semiconductor manufacturing technology advances, the size of a wafer becomes bigger and the critical dimension becomes smaller than before. This means a wafer can be used to produce more chips. However, the process of manufacturing chips is costly while using today’s semiconductor manufacturing technology. Any defect on the wafer may fail the final product and cause large business loss. To reduce the chance of defects on the wafer, the parameters of the manufacturing environment must be precisely controlled. To achieve this goal, a monitoring system is usually used to collect real-time information, which helps shorten the decision time for changing the parameters of the manufacturing environment. For now most of the semiconductor manufacturing machines support the SECS/GEM standard, which defines how to obtain the monitoring data of the machines via TCP/IP. The problem is that, the existing monitoring approach does not support failover and needs human intervention when the system crashes. This implies a long recovery time. Moreover, the failure may further cause other problems. For example, a manufacturing alarm system could generate a false alarm or overlook an important abnormality during the failure time, since the monitoring system fails to feed any data to the alarm system. To solve this problem, this thesis introduces a new fault-tolerant monitoring architecture based on the mechanisms of server redundancy and checkpointing. With the proposed architecture, the monitoring system is able to achieve a very small downtime, and consequently helps the manufacturing process and the yield rate.
[1] S. Villareal, V. Riggins, J. Schroeder , P. Tissot and J. P. Wallace, "Interactive Semiconductor Process Overview," in Proceedings of the Frontiers in Education Conference, Atlanta, GA, vol. 1, pp. 2b4.4-2b4.8, 1995.
[2] J. Hunter, D. Delp, D. Collins and J. Si, "Understanding a Semiconductor Process Using a Full-Scale Model," IEEE Transactions on Semiconductor Manufacturing, vol. 15, no. 2, pp. 285-289, May 2002.
[3] Dja-Shin Wang, Ya-Wen Yu, Sheng-Hong Wang and Bor-Wen Cheng, "Statistical Process Control on Auto Correlated Process," in Proceedings of the 10th International Conference on Service Systems and Service Management, Hong Kong, pp. 1-84, 2013.
[4] Q. P. He and J. Wang, "Fault Detection Using the k-Nearest Neighbor Rule for Semiconductor Manufacturing Processes," IEEE Transactions on Semiconductor Manufacturing, vol. 20, no. 4, pp. 345-354, Nov. 2007.
[5] Dan Ling, Ying Zheng, Yan Wang and Chengjie Xu, "Control Performance Monitoring for EWMA-Based Run-to-Run Control in Semiconductor Manufacturing Processes," in Proceedings of the 33rd Chinese Control Conference (CCC), Nanjing, pp. 2990-2994, 2014.
[6] Zhiqiang Ge and Zhihuan Song, "Semiconductor Manufacturing Process Monitoring Based on Adaptive Substatistical PCA," IEEE Transactions on Semiconductor Manufacturing, vol. 23, no. 1, pp. 99-108, Feb. 2010.
[7] M. Namikata, K. Sato, K. Iizuka and K. Ueda, "Methods of Dynamic Scaling with VM for High Availability Server Clusters," in Proceedings of the 10th Asia-Pacific Symposium on Information and Telecommunication Technologies (APSITT), Colombo, pp. 1-3, 2015.
[8] "Docker Introduction. " [Online]. Available: https://www.docker.com/what-docker. [Accessed: 25-May-2016].
[9] F. Aderholdt, F. Han, S. L. Scott and T. Naughton, "Efficient Checkpointing of Virtual Machines Using Virtual Machine Introspection," in Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Chicago, IL, pp. 414-423, 2014.
[10] Basel Yousef, Hong Zhu and Muhammad Younas, "Tenant Level Checkpointing of Meta-Data for Multi-Tenancy SaaS," in Proceedings of the IEEE 8th International Symposium on Service Oriented System Engineering (SOSE), Oxford, pp. 148-153, 2014.
[11] "VMware vSphereTM 4 Fault Tolerance: Architecture and Performance." [Online]. Available: http://www.vmware.com/files/pdf/perf-vsphere-fault_tolerance.pdf. [Accessed: 27-Jun-2016].
[12] "SECS/GEM Introduction." [Online]. Available: http://cimlab.ie.nthu.edu.tw/course/auto/text/class_4.pdf. [Accessed: 25-May-2016].
[13] Hoi Chan and Trieu Chieu, "An Approach to High Availability for Cloud Servers with Snapshot Mechanism," in Proceedings of the Industrial Track of the 13th ACM/IFIP/USENIX International Middleware Conference, New York, NY, USA: ACM, 2012.
[14] Dmitry Duplyakin, Matthew Haney and Henry Tufo, "Highly Available Cloud-Based Cluster Management," in Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Shenzhen, pp. 1201-1204, 2015.
[15] I. P. Egwutuoha, D. Levy, B.Selic, and S. Chen, "A Survey of Fault Tolerance Mechanisms and Checkpoing/Restart Implementations for High Performance Computing Systems," in Proceedings of the Springer, vol. 65, no. 3, pp. 1302-1326, 2013.
[16] B.Cully, G.Lefebvre, D.Meyer, M.Feeley, N.Hutchinson, and A.Warfield, "Remus: High Availability via Asynchronous Virtual Machine Replication," in Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation (NSDI’08), Berkeley, CA, USA, pp. 161-174, 2008.
[17] D. J. Scales, M. Nelson, and G. Venkitachalam, "The Design and Evaluation of a Practical System for Fault-Tolerant Virtual Machines," Technical Report VMWare-RT-2010-001, VMWare, 2010.
[18] T. C. Bressoud and F. B. Schneider, "Hypervisor-Based Fault Tolerance," ACM Transactions on Computer Systems, vol. 14, no. 1, pp. 80-107, 1996.
[19] "VMware High Availability (HA)." [Online]. Available: http://www.vmware.com/files/pdf/VMware-High-Availability-DS-EN.pdf. [Accessed: 25-May-2016].
[20] "Dell™ High Availability Solutions Guide for Microsoft® Hyper-V™ R2." [Online]. Available: http://www.dell.com/downloads/global/solutions/Hyper-V_guide%20with_HA%20Cluster_0.2__10_27_11.pdf. [Accessed: 25-May-2016].