跳到主要內容

簡易檢索 / 詳目顯示

研究生: 謝宗甫
Tsung-Fu Thsieh
論文名稱: 應用於深度神經網路系統內動態隨機存取記憶體之錯誤糾正碼式刷新功耗降低技術
ECC-Based Refresh Power Reduction Technique for DRAMs of Deep Neural Network Systems
指導教授: 李進福
Jin-Fu Li
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 107
語文別: 英文
論文頁數: 89
中文關鍵詞: 深度神經網路動態隨機記憶體刷新功耗資料壓縮自我測試
外文關鍵詞: Deep Neural Network, DRAM, redresh power, data compression, BIST
相關次數: 點閱:10下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 深度神經網路(DNN)被視為一個十分有應用價值的人工智慧技術。DNN系統通常需要以動態隨機記憶體(DRAM)來儲存數據。然而DRAM是一種十分耗電的元件,因此需要有針對DNN系統中用於降低DRAM功耗的技術。
    本論文提出了一種混和投票機制與錯誤更正碼(Voting and error-correction code, VECC)的資料保護技術,通過延長DRAM的刷新週期來降低功耗。VECC以投票的方法保護在DNN中數值趨近於零的權重資料,並以錯誤糾正碼保護剩餘資料。以此種混合式的保護機制來糾正受到DRAM刷新周期延長而出現的資料失效(retention fault)。為了實現VECC的技術於DNN系統中,本論文提出了一個軟硬體結合的自我測試技術(Software-Hardware-Cooperated built-in self test, SHC-BIST),用以蒐集在不同DRAM刷新周期下的資料錯誤資訊。此外也提出了相應的解碼以及重組硬體設計。
    模擬結果顯示,在四個著名的DNN模組中,VECC可以節省至少93.7%的DRAM刷新功耗,且精準度損耗(accuracy loss)小於0.5%,而額外所需付出的錯誤檢驗碼位元數均小於原始資料的1%。


    Deep neural network (DNN) is considered as a practical and effective artificial intelligence technique.
    A DNN system typically needs a dynamic random access memory (DRAM) for the storing
    of data. However, DRAM is a power-hungry component. Effective techniques for reducing the
    power consumption of the DRAM in a DNN system thus are needed.
    In this thesis, a hybrid voting and error-correction code (VECC) technique is proposed to reduce
    the refresh power of DRAMs in DNN systems by extending the DRAM refresh period. The
    VECC technique takes advantage of the characteristics of wights of DNN model to reduce the cost
    of check bits. Most of weights of a DNN model are close to zero. Therefore, the VECC technique
    extends the refresh period of DRAMs by using the voting mechanism to protect weights being
    close to zero from retention faults and using the error correction code (ECC) to protect weights
    being not close to zero from retention faults. To realize the VECC technique, a software-hardwarecooperated
    built-in self-test (SHC-BIST) scheme is proposed to test the cells with data retention
    faults of the DRAM with respect to different refresh periods. Also, a decoding and remapping unit
    is proposed to decode and remap the encoded weights.
    Simulation results show that the proposed VECC technique can achieve up to 93.7% refresh
    power saving for four typical DNN models with the adverse effect of inference accuracy loss less
    than 0.5%, and the check bit overhead is less than 1%.
    i

    1 Introduction 1 1.1 Deep Neural Network System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Neural Network Acceleration System . . . . . . . . . . . . . . . . . . . . 4 1.2 Dynamic Random Access Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Organization of DRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 DRAM Refresh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 Block-Based Multiperiod Refresh Power Reduction . . . . . . . . . . . . . 8 1.3.2 ECC-Based Refresh Period Extending . . . . . . . . . . . . . . . . . . . . 11 1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.5 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.6 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2 Proposed VECC Technique for DRAM Refresh Period Extension 14 2.1 Characteristic of Weights in DNN Systems . . . . . . . . . . . . . . . . . . . . . . 14 2.2 VECC Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 En/Decoding & Refresh Period Extending Process . . . . . . . . . . . . . . . . . . 18 2.3.1 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 Encoding Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.3 Decoding Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.4 Refresh Period Selection Flow . . . . . . . . . . . . . . . . . . . . . . . . 34 3 Hardware Design 42 3.1 VECC Decoder and Read Address Controller . . . . . . . . . . . . . . . . . . . . 42 3.2 Programmable BIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4 Simulation Result 56 4.1 Accuracy Loss Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2 Power Saving Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3 Bits Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.4 Read Latency Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.5 Area Overhead Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5 Conclusion 68

    [1] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image
    recognition,” arXiv 1409.1556, 9 2014.
    [2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE
    Computer Vision and Pattern Recognition (CVPR), June 2016, pp. 770–778.
    [3] T. N. Sainath, O. Vinyals, A. Senior, and H. Sak, “Convolutional, long short-term memory,
    fully connected deep neural networks,” in Proc. International Conference on Acoustics,
    Speech and Signal Processing (ICASSP), April 2015, pp. 4580–4584.
    [4] J. Donahue, L. A. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadarrama, K. Saenko, and
    T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description,”
    IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp.
    677–691, April 2017.
    [5] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis,
    J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz,
    L. Kaiser, M. Kudlur, J. Levenberg, D. Man´e, R. Monga, S. Moore, D. Murray,
    C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke,
    V. Vasudevan, F. Vi´egas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and
    X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015,
    software available from tensorflow.org.
    [6] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and
    T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proc. ACM
    International Conference on Multimedia, New York, NY, USA, 2014, pp. 675–678.
    [7] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document
    recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, Nov 1998.
    [8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional
    neural networks,” in Proc. Adv. Neural Information Processing Systems, ser. NIPS,
    vol. 25. USA: Curran Associates Inc., 2012, pp. 1097–1105.
    [9] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture
    for computer vision,” in Proc. IEEE Conference on Computer Vision and Pattern
    Recognition (CVPR), June 2016, pp. 2818–2826.
    [10] V. S. J. Emer and Y.-H. Chen, “Tutorial on hardware architectures for deep neural networks,”
    http://eyeriss.mit.edu/tutorial.html, 2017.
    [11] Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable
    accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits,
    vol. 52, no. 1, pp. 127–138, Jan 2017.
    [12] T. Luo, S. Liu, L. Li, Y. Wang, S. Zhang, T. Chen, Z. Xu, O. Temam, and Y. Chen, “Dadiannao:
    A neural network supercomputer,” IEEE Transactions on Computers, vol. 66, no. 1, pp.
    73–88, Jan 2017.
    [13] J. Sim, J. S. Park, M. Kim, D. Bae, Y. Choi, and L. S. Kim, “14.6 A 1.42TOPS/W deep
    convolutional neural network recognition processor for intelligent ioe systems,” in IEEE International
    Solid-State Circuits Conference (ISSCC), Jan 2016, pp. 264–265.
    [14] D. Shin, J. Lee, J. Lee, and H. J. Yoo, “14.2 DNPU: An 8.1TOPS/W reconfigurable cnnrnn
    processor for general-purpose deep neural networks,” in IEEE International Solid-State
    Circuits Conference (ISSCC), Feb 2017, pp. 240–241.
    [15] S. Yin, P. Ouyang, S. Tang, F. Tu, X. Li, S. Zheng, T. Lu, J. Gu, L. Liu, and S. Wei, “A high
    energy efficient reconfigurable hybrid neural network processor for deep learning applications,”
    IEEE Journal of Solid-State Circuits, vol. 53, no. 4, pp. 968–982, April 2018.
    [16] J. Kim and M. C. Papaefthymiou, “Block-based multiperiod dynamic memory design for low
    data-retention power,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
    vol. 11, no. 6, pp. 1006–1018, Dec 2003.
    [17] T. Hamamoto, S. Sugiura, and S. Sawada, “On the retention time distribution of dynamic
    random access memory (DRAM),” IEEE Transactions on Electron Devices, vol. 45, no. 6,
    pp. 1300–1309, Jun 1998.
    [18] J. Liu, B. Jaiyen, R. Veras, and O. Mutlu, “RAIDR: Retention-aware intelligent DRAM refresh,”
    in International Symposium on Computer Architecture (ISCA), June 2012, pp. 1–12.
    [19] L. C. Hsia, T. Chang, S. J. Chang, D. Fan, H. Y.Wei, and J. Jan, “Effects of hydrogen annealing
    on data retention time for high density Drams,” in Proc. of Technical Papers. International
    Symposium on VLSI Technology, Systems, and Applications, June 1997, pp. 142–147.
    [20] H. Yamauchi, T. Iwata, A. Uno, M. Fukumoto, and T. Fujita, “A circuit technology for a selfrefresh
    16 Mb DRAM with less than 0.5/mu/A/MB data-retention current,” IEEE Journal of
    Solid-State Circuits, vol. 30, no. 11, pp. 1174–1182, Nov 1995.
    [21] Y. Kagenishi, H. Hirano, A. Shibayama, H. Kotani, N. Moriwaki, M. Kojima, and T. Sumi,
    “Low power self refresh mode DRAM with temperature detecting circuit,” in Symposium on
    VLSI Circuits, May 1993, pp. 43–44.
    [22] J. Nyathi and J. G. Delgado-Frias, “Self-timed refreshing approach for dynamic memories,”
    in Proc. IEEE International ASIC Conference, Sep 1998, pp. 169–173.
    [23] T. Ohsawa, K. Kai, and K. Murakami, “Optimizing the DRAM refresh count for merged
    DRAM/logic LSIs,” in International Symposium on Low Power Electronics and Design, Aug
    1998, pp. 82–87.
    [24] S. K. Lu and H. K. Huang, “Adaptive block-based refresh techniques for mitigation of data
    retention faults and reduction of refresh power,” in Proc. International Test Conference in
    Asia (ITC-Asia), Sept 2017, pp. 101–106.
    [25] Y. C. Yu, C. S. Hou, L. J. Chang, J. F. Li, C. Y. Lo, D. M. Kwai, Y. F. Chou, and C. W. Wu,
    “A hybrid ECC and redundancy technique for reducing refresh power of DRAMs,” in IEEE
    VLSI Test Symposium (VTS), April 2013, pp. 1–6.
    [26] V. Sze, Y. Chen, T. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A
    tutorial and survey,” Proc. of the IEEE, vol. 105, no. 12, pp. 2295–2329, Dec 2017.
    [27] “Keras documentation,” http://keras.io/applications/.
    [28] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy,
    A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition
    Challenge,” International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp.
    211–252, 2015.
    [29] B. Parhami, “Voting networks,” IEEE Transactions on Reliability, vol. 40, no. 3, pp. 380–394,
    Aug 1991.
    [30] C. Yang, J. Li, Y. Yu, K.Wu, C. Lo, C. Chen, J. Lai, D. Kwai, and Y. Chou, “A hybrid built-in
    self-test scheme for DRAMs,” in VLSI Design, Automation and Test(VLSI-DAT), April 2015,
    pp. 1–4.
    [31] K. Kim and J. Lee, “A new investigation of data retention time in truly nanoscaled DRAMs,”
    IEEE Electron Device Letters, vol. 30, no. 8, pp. 846–848, Aug 2009.

    QR CODE
    :::