跳到主要內容

簡易檢索 / 詳目顯示

研究生: 鄧凱云
Kai-Yun Deng
論文名稱: 應用於深度神經網絡加速器中靜態隨機存取記憶體之內建自我修復技術
Built-In Self-Repair Scheme for SRAMs in Deep Neural Network Accelerators
指導教授: 李進福
Jin-Fu Li
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 97
中文關鍵詞: 內建自我修復技術修復率內建備份元件分析技術
外文關鍵詞: built in self repair, repair rate, built-in redundancy-analysis
相關次數: 點閱:10下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 深度神經網絡(Deep Neural Networks, DNN)已經被廣泛地使用於人工智能應用。而典型的深度神經網絡系統加速器通常具有靜態隨機訪問記憶體(SRAM),用來暫時儲存數據及資料。在此論文當中,我們提出了一種有效的內建自我修復方案(Built-in self-repair),用來提高深度神經網絡系統加速器中靜態隨機訪問記憶體(SRAM)的良率。論文的第一部分,我們提出一種交換機制的技術,在有限制的降低推論精確度(inference accuracy)之下提高記憶體良率。而此交換機制技術可以與現有的內建冗餘分析(Built-in redundancy analysis)演算法相互結合。我們實現兩種與交換機制結合之內建冗餘分析方案,包含局部修復最大化(local-repair-most, LRM)以及全面的內建冗餘分析(exhaustive BIRA)兩種。實驗結果表明,經修改的局部修復最大化演算法與全面的內建冗餘分析演算法各別在記憶體大小256 千位元組且帕松分布均值為0.2~1.0 (1.0~3.0)的條件下進行模擬,可以提高修復率大約3.4% (30.7%)與3.5% (27.3%),並且犧牲至多0.10% (0.73%)
    及0.12% (0.95%) 於MobileNet 以及Res-Net-50 模型之推論精確度。論文的第二部分,我們為上述所提出的內建冗餘分析方案提供一個自動評估與驗證平台。在平台當中,內建冗餘分析編譯器可以產生我們所提出的內建冗餘分析的暫存器傳輸級(RTL)之設計。其中評估工具根據指定的深度神經網絡模型以及加速器的靜態隨機訪問記憶體,提供修復率以及推論精確度的預測。另外驗證平台的部分,可以產生Verilog 測試平台(testbench)用以驗證內建冗餘分析的暫存器傳輸級設計。


    Deep neural networks (DNNs) have been widely used for artificial intelligence applications. An accelerator in a DNN system typically has static random access memories (SRAMs) for data buffering. In this thesis, we propose an efficient built-in self-repair scheme for enhancing the yield of SRAMs in the accelerator of DNN systems. In the first part of this thesis, a swapping mechanism is proposed to increase the yield under the constraint of
    inference accuracy reduction. The swapping mechanism can be integrated into existing built-in redundancy analysis (BIRA) algorithms. A local-repair-most (LRM) and an exhaustive BIRA algorithms are modified to include the swapping mechanism. Simulation results show that the modified LRM scheme and exhaustive BIRA schemes can gain about 3.4% (30.7%) and 3.5% (27.3%) increment of repair rate by sacrificing most 0.10% (0.73%) and 0.12% (0.95%) inference accuracy reduction for a MobileNet and ResNet-50 under the condition of injection fault with Poisson distribution mean value 0.2~1.0 (and in mean value 1.0~3.0) of 256 Kbyte memory size with 2D redundancy configuration, respectively. In the second part of this thesis, we present an automation, evaluation and verification platform for the proposed BIRA schemes. In the platform, a BIRA compiler is designed to generate the RTL of the proposed BIRA schemes. An evaluation tool is proposed to estimate the repair rate and inference accuracy for the SRAMs in a given accelerator executing a given DNN model. Finally, the platform can generate Verilog testbench for the verification of RTL design of
    BIRAs.

    1 Introduction 1 1.1 Deep Neural Network System . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Neural Network Acceleration System . . . . . . . . . . . . . . . . . . 4 1.2 Memory Built-In Self-Repair Techniques . . . . . . . . . . . . . . . . . . . . 4 1.2.1 BISR Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Built-In Redundancy Analysis . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Error Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Thesis Motivation and Contribution . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Proposed BISR Scheme for Memories in DNN Systems 9 2.1 Concept of Swap Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Swap Mechanism Variability . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 RA Schemes with Swap Mechanism . . . . . . . . . . . . . . . . . . . 12 2.2 Proposed Heuristic BIRA Algorithm with Swap Mechanism . . . . . . . . . . 12 2.2.1 Local Bitmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 Built-In Redundancy Analysis Algorithm . . . . . . . . . . . . . . . . 14 2.2.3 Design of the BIRA Circuit . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Proposed Exhaustive BIRA Algorithm with Swap Mechanism . . . . . . . . 25 2.3.1 CRESTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.2 Built-In Redundancy Analysis Algorithm . . . . . . . . . . . . . . . . 26 2.3.3 Design of the BIRA Circuit . . . . . . . . . . . . . . . . . . . . . . . 28 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3 Evaluation and Verification Platform 34 3.1 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Evaluation Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.1 Fault Map Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.2 Redundancy Analysis Categories in DNN Accelerator . . . . . . . . . 37 3.2.3 Evaluation Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3 Verification Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.1 Verify Simulation and Design Result . . . . . . . . . . . . . . . . . . 40 3.3.2 Automatic RTL Generation of Proposed BIRA . . . . . . . . . . . . . 42 4 Experimental Result and Analysis 45 4.1 Repair Rate Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.1 Repair Rate of HRA-SW Algorithm . . . . . . . . . . . . . . . . . . . 45 4.1.2 Repair Rate of ERA-SW Algorithm . . . . . . . . . . . . . . . . . . . 49 4.1.3 Repair Rate of Comparison Results . . . . . . . . . . . . . . . . . . . 53 4.2 Inference Accuracy Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.1 Number of Swap Words and Inference Accuracy . . . . . . . . . . . . 58 4.2.2 Different Size of Memory and Inference Accuracy . . . . . . . . . . . 60 4.2.3 Two RA Schemes and Inference Accuracy . . . . . . . . . . . . . . . 62 4.3 Area Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.3.1 Area Overhead of HRA-SW scheme . . . . . . . . . . . . . . . . . . . 64 4.3.2 Area Overhead of ERA-SW scheme . . . . . . . . . . . . . . . . . . . 65 5 Conclusion and Future Work 68

    [1] D. Gerhard, Neuroscience. 5th Edition, 5th ed., ser. The Yale Journal of Biology and Medicine. Yale University, US: YJBM, 3 Jan. 2013, vol. 86.
    [2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, no. 3, pp. 211–252, 2015.
    [3] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587.
    [4] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Advances in neural information processing systems, 2014, pp. 568–576.
    [5] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal processing magazine, vol. 29, no. 6, pp. 82–97, 2012.
    [6] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural language processing (almost) from scratch,” Journal of machine learning research, vol. 12, no. Aug, pp. 2493–2537, 2011.
    [7] H. Zeng, M. D. Edwards, G. Liu, and D. K. Gifford, “Convolutional neural network architectures for predicting DNA–protein binding,” Bioinformatics, vol. 32, no. 12, pp. i121–i127, 2016.
    [8] M. Jermyn, J. Desroches, J. Mercier, M.-A. Tremblay, K. St-Arnaud, M.-C. Guiot, K. Petrecca, and F. Leblond, “Neural networks improve brain cancer detection with
    Raman spectroscopy in the presence of operating room light artifacts,” Journal of biomedical optics, vol. 21, no. 9, p. 094002, 2016.
    [9] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint
    arXiv:1312.5602, 2013.
    [10] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
    [11] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
    [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
    [14] W. Du, Z. Wang, and D. Chen, “Optimizing of convolutional neural network accelerator,” in Green Electronics, C. Ravariu and D. Mihaiescu, Eds. Rijeka:
    IntechOpen, 2018, ch. 8. [Online]. Available: https://doi.org/10.5772/intechopen.75796
    [15] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015, pp. 161–170.
    [16] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, “CUDNN: Efficient primitives for deep learning,” arXiv preprint
    arXiv:1410.0759, 2014.
    [17] Y.-H. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 367–379, 2016.
    [18] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2016.
    [19] Y. Zorian, “Embedded memory test and repair: Infrastructure ip for soc yield,” in Proceedings IEEE International Test Conference. IEEE, 2002, pp. 340–349.
    [20] Y. Zorian and S. Shoukourian, “Embedded-memory test and repair: infrastructure ip for soc yield,” IEEE Design & Test of Computers, no. 3, pp. 58–66, 2003.
    [21] J.-F. Li, J.-C. Yeh, R.-F. Huang, and C.-W. Wu, “A built-in self-repair design for RAMs with 2-D redundancy,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 13, no. 6, pp. 742–745, 2005.
    [22] T.-W. Tseng, J.-F. Li, and C.-C. Hsu, “ReBISR: A reconfigurable built-in self-repair scheme for random access memories in SOCs,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 18, no. 6, pp. 921–932, 2009.
    [23] T.-W. Tseng, Y.-J. Huang, and J.-F. Li, “DABISR: A defect-aware built-in self-repair scheme for single/multi-port RAMs in SoCs,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 10, pp. 1628–1639, 2010.
    [24] T.-W. Tseng, J.-F. Li, and C.-S. Hou, “A built-in method to repair SoC RAMs in parallel,” IEEE design & Test of Computers, vol. 27, no. 6, pp. 46–57, 2010.
    [25] C.-S. Hou, J.-F. Li, and T.-W. Tseng, “Memory built-in self-repair planning framework for RAMs in SoCs,” IEEE Transactions on Computer-Aided Design of Integrated
    Circuits and Systems, vol. 30, no. 11, pp. 1731–1743, 2011.
    [26] S.-K. Lu, C.-J. Tsai, and M. Hashizume, “Enhanced built-in self-repair techniques for improving fabrication yield and reliability of embedded memories,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 24, no. 8, pp. 2726–2734, 2016.
    [27] A. Tanabe, T. Takeshima, H. Koike, Y. Aimoto, M. Takada, T. Ishijima, N. Kasai, H. Hada, K. Shibahara, T. Kunio, et al., “A 30-ns 64-Mb DRAM with built-in self-
    test and self-repair function,” IEEE Journal of Solid-State Circuits, vol. 27, no. 11, pp. 1525–1533, 1992.
    [28] V. Schober, S. Paul, and O. Picot, “Memory built-in self-repair using redundant words,” in Proceedings IEEE International Test Conference 2001, 2001, pp. 995–1001.
    [29] D. Anand, B. Cowan, O. Farnsworth, P. Jakobsen, S. Oakland, M. R. Ouellette, and D. L. Wheater, “An on-chip self-repair calculation and fusing methodology,” IEEE
    Design Test of Computers, vol. 20, no. 5, pp. 67–75, 2003.
    [30] C.-T. Huang, C.-F. Wu, J.-F. Li, and C.-W. Wu, “Built-in redundancy analysis for memory yield improvement,” IEEE Transactions on Reliability, vol. 52, no. 4, pp. 386–399, 2003.
    [31] I. Kang, W. Jeong, and S. Kang, “High-efficiency memory BISR with two serial RA stages using spare memories,” Electronics Letters, vol. 44, no. 8, pp. 515–517, 2008.
    [32] X. Wang, D. Vasudevan, and H.-H. S. Lee, “Global built-in self-repair for 3D memories with redundancy sharing and parallel testing,” in Proceedings IEEE International 3D Systems Integration Conference (3DIC), 2011, pp. 1–8.
    [33] M. Nicolaidis, N. Achouri, and S. Boutobza, “Optimal reconfiguration functions for column or data-bit built-in self-repair,” in Proceedings IEEE Design Automation and
    Test in Europe (DATE’03), 2003, pp. 590–595.
    [34] R. Zappa, C. Selva, D. Rimondi, C. Torelli, M. Crestan, G. Mastrodomenico, and L. Albani, “Micro programmable built-in self repair for SRAMs,” in Proceedings IEEE International Workshop on Memory Technology, Design and Testing, 2004, pp. 72–77.
    [35] C.-L. Su, R.-F. Huang, and C.-W. Wu, “A processor-based built-in self-repair design for embedded memories,” in Proceedings IEEE Test Symposium, 2003, pp. 366–371.
    [36] X. Du, S. M. Reddy, W.-T. Cheng, J. Rayhawk, and N. Mukherjee, “At-speed built-in self-repair analyzer for embedded word-oriented memories,” in Proceedings IEEE 17th International Conference on VLSI Design, 2004, pp. 895–900.
    [37] P. Ohler, S. Hellebrand, and H.-J. Wunderlich, “An integrated built-in test and repair approach for memories with 2D redundancy,” in Proceedings IEEE 12th European Test Symposium (ETS’07), 2007, pp. 91–96.
    [38] S.-Y. Kuo and W. K. Fuchs, “Efficient spare allocation for reconfigurable arrays,” IEEE Design & Test of Computers, vol. 4, no. 1, pp. 24–31, 1987.
    [39] T.-W. Tseng, J.-F. Li, and D.-M. Chang, “A built-in redundancy-analysis scheme for RAMs with 2D redundancy using 1D local bitmap,” in Proceedings of the Design Au-
    tomation & Test in Europe Conference, vol. 1, 2006, pp. 6–pp.
    [40] K. Cho, Y.-W. Lee, S. Seo, and S. Kang, “An efficient BIRA utilizing characteristics of spare pivot faults,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 3, pp. 551–561, 2018.
    [41] T. Kawagoe, J. Ohtani, M. Niiro, T. Ooishi, M. Hamada, and H. Hidaka, “A built-in self-repair analyzer (CRESTA) for embedded DRAMs,” in Proceedings IEEE International Test Conference 2000, 2000, pp. 567–574.
    [42] T.-J. Chen, J.-F. Li, and T.-W. Tseng, “Cost-efficient built-in redundancy analysis with optimal repair rate for rams,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 6, pp. 930–940, 2012.
    [43] S. Nakahara, K. Higeta, M. Kohno, T. Kawamura, and K. Kakitani, “Built-in self-test for GHz embedded SRAMs using flexible pattern generator and new repair algorithm,” in Proceedings IEEE International Test Conference, 1999, pp. 301–310.
    [44] D. K. Bhavsar, “An algorithm for row-column self-repair of RAMs and its implementation in the Alpha 21264,” in Proceedings IEEE International Test Conference, 1999, pp. 311–318.
    [45] T.-Y. Hsieh, K.-H. Li, and Y.-H. Peng, “On efficient error-tolerability evaluation and maximization for image processing applications,” in Proceedings IEEE Technical Papers of 2014 International Symposium on VLSI Design, Automation and Test, 2014, pp. 1–4.
    [46] T.-Y. Hsieh, C.-C. Ku, and C.-H. Yeh, “A yield and reliability enhancement framework for image processing applications,” in 2012 IEEE Asia Pacific Conference on Circuits and Systems, 2012, pp. 683–686.
    [47] T.-Y. Hsieh, M. A. Breuer, M. Annavaram, S. K. Gupta, and K.-J. Lee, “Tolerance of performance degrading faults for effective yield improvement,” in Proceedings IEEE 2009 International Test Conference, 2009, pp. 1–10.
    [48] Q. Fan, S. S. Sapatnekar, and D. J. Lilja, “Cost-quality trade-offs of approximate memory repair mechanisms for image data,” in Proceedings IEEE 2017 18th International Symposium on Quality Electronic Design (ISQED), 2017, pp. 438–444.
    [49] T. F. Hsieh, J. F. Li, J. S. Lai, C. Y. Lo, D. M. Kwai, and Y. F. Chou, “Refresh Power Reduction of DRAMs in DNN Systems Using Hybrid Voting and ECC Method,” in Proceedings IEEE International Test Conference in Asia (ITC-Asia), 2020.
    [50] H. R. Mahdiani, S. M. Fakhraie, and C. Lucas, “Relaxed fault-tolerant hardware implementation of neural networks in the presence of multiple transient errors,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 8, pp. 1215–1228, 2012.

    QR CODE
    :::