跳到主要內容

簡易檢索 / 詳目顯示

研究生: 王彥翔
Yen-Hsiang Wang
論文名稱: 基於知識提煉的單頭式終身學習
Single-head lifelong learning based on distilling knowledge
指導教授: 施國琛
Timothy K. Shih
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 64
中文關鍵詞: 終身式學習知識蒸餾
外文關鍵詞: Continuous learning
相關次數: 點閱:13下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 終身式學習相較於其他像是物件偵測、物件追蹤領域,是一個相對比較冷門且新興的領域。這個領域在研究的議題是讓神經網路能夠擁有像人類一樣持續學習的能力,並且能夠利用過往學習的知識,使之在面對未來的任務時,學習起來能夠更加輕鬆、表現更好。
    目前終身式學習能夠分為單頭終身式學習與多頭終身式學習,其差別是在測試時是否有任務類別資料,單頭終身式學習並無包含任務類別資訊,因此在分類上需要進行設計來避免資料不平衡導致分類偏好問題;而多頭終身式學習因為其在測試時具有任務類別資訊,不需要處理任務類別分類偏好問題,因此這個子領域主要在研究如何優化模型,使之能夠占用最少的空間。
    此篇論文主要圍繞在單頭終身式學習領域,並且採用知識蒸餾策略來避免模型災耐性遺忘的問題。以往的模型蒸餾方式主要著重在模型的輸出層來進行蒸餾,又或是透過中間層的特徵進行歐式距離計算來定義為訓練損失。本文改良了透過歐式距離計算的方法,將特徵經過平均池化層運算,來規避特徵複雜度過高而造成訓練不佳的情形,並且另外再加上一層全連接層作為輸出層,將此組合成一個分支網路,利用分支網路的輸出搭配原本主網路的輸出,來傳承不同尺度特徵的資訊,藉此獲得更佳的效果。


    Compared with other fields, such as object detection and tracking, Lifelong Learning is a relatively unpopular and emerging field. The main purpose of this field is to enable neural networks to have the ability to continuously learn like humans, and use the knowledge which was learned in the past to make it easier to learn and perform better when facing future tasks.
    At present, Lifelong Learning can be divided into two types: single-head and multi-head Lifelong Learning. The difference between these two types of Lifelong Learning is whether there is task category data during the testing stage or not. Single-head Lifelong Learning doesn’t contain task category information at testing stage, so the model needs to be designed to handle the problem of imbalanced data which causes the classification preference. On the other hand, the task category is accessible at the testing stage under the definition of the multi-head Lifelong Learning. Therefore, there is no need to deal with the task category classification preference problem. The researchers of this subfield are mainly studying how to optimize the model so that it can use the least memory and get the best performance.
    In this thesis, we focus on the single-head Lifelong Learning problem, and adopt the knowledge distillation strategy to avoid catastrophic forgetting problems. Previous distillation strategies often use the distribution from the output layers to distill the knowledge, or use the features from intermediate layers to calculate the distillation loss by Euclidean distance. Besides, we propose a branch distillation method. We add an average pooling layer and fully connected layer after we obtain the features from intermediate layers. And then, we use the distribution from this branch network to distill different scales of features in the network. By using this novel knowledge distillation methods, we can improve our model’s performance.

    1 Introduction 1 2 Related work 5 2.1 Features Extraction Methods 5 2.1.1 AlexNet 5 2.1.2 VGGNet 6 2.1.3 Residual Neural Network 7 2.2 Metric Learning 8 2.2.1 Siamese Neural Network 8 2.2.2 Triplet Network 9 2.3 Knowledge Distillation 10 2.3.1 Distilling the Knowledge in a Neural Network 10 2.3.2 Paraphrasing Complex Network 11 2.3.3 Feature-level Ensemble for Knowledge Distillation 13 2.4 Lifelong Learning 16 2.4.1 Multi-head 17 2.4.1.1 Elastic Weight Consolidation 17 2.4.1.2 Progressive Neural Networks 18 2.4.1.3 Adversarial Continual Learning 19 2.4.2 Single-head 20 2.4.2.1 Learning without Forgetting 20 2.4.2.2 Incremental Classifier and Representation Learning 22 2.4.2.3 Bias Correction 24 2.4.2.4 Pooled Outputs Distillation for Small-Tasks Incremental Learning 26 3 Preliminary 29 3.1 Notation 29 3.2 Problem Definition 29 4 Proposed Method 30 4.1 Feature extraction 30 4.2 Classifier 31 4.2.1 Metric Learning 31 4.2.2 Bias fully connected layers 32 4.3 Knowledge distillation 33 4.4 Branch network knowledge distillation 35 4.4.1 Single-layer distillation 36 4.4.2 Multi-layer distillation 38 4.5 Distillation loss 40 4.6 Training process 42 5 Experimental Results 44 5.1 CIFAR-100 44 5.2 Implementation 45 5.3 Result 45 5.4 Performance analysis of Single-Layer and Multi-Layers distillation 49 6 Conclusion 50 7 Reference 51

    [1] German I. Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, Stefan Wermter, “Continual lifelong learning with neural networks: A review”, in Neural Networks, vol. 113, pp. 54-71, May 2019.
    [2] David E. Rumelhart, Geoffrey E. Hinton & Ronald J. Williams, “Learning representations by back-propagating errors”, in Nature vol. 323, no. 6088, pp. 533–536, 1986.
    [3] Haibo He, Edwardo A. Garcia, “Learning from Imbalanced Data”, in IEEE vol. 21, no. 9, Sept. 2009.
    [4] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition”, in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1 Nov. 1998.
    [5] A. Krizhevsky, I. Sutskever, and G. Hinton. “Imagenet classification with deep convolutional neural networks.” In NIPS, 2012.
    [6] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition.” In ICLR, 2015.
    [7] “VGG16 – Convolutional Network for Classification and Detection”, [Online]. Available: https://neurohive.io/en/popular-networks/vgg16/. [Accessed July 2021].
    [8] K. He, X. Zhang, S. Ren, and J. Sun. “Deep residual learning for image recognition.” in CVPR, 2016.
    [9] Gregory Koch, Richard Zemel, Ruslan Salakhutdinov, “Siamese neural networks for one-shot image recognition”, in ICML, 2015.
    [10] Elad Hoffer, Nir Ailon, “Deep metric learning using Triplet network.”, in ICLR, 2015
    [11] Geoffrey Hinton, Oriol Vinyals, Jeff Dean, “Distilling the Knowledge in a Neural Network”, in NIPS, 2015.
    [12] Jangho Kim, SeongUk Park, Nojun Kwak, “Paraphrasing Complex Network: Network Compression via Factor Transfer”, in NIPS, 2018.
    [13] SeongUk Park, Nojun Kwak, “FEED: Feature-level Ensemble for Knowledge Distillation”, in ECAI, 2020.
    [14] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, Raia Hadsell. “Overcoming catastrophic forgetting in neural networks” . In PNAS, 2017.
    [15] Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia Hadsell. “Progressive Neural Networks.”, arXiv:1606.04671, 2016.
    [16] Sayna Ebrahimi, Franziska Meier, Roberto Calandra, Trevor Darrell, Marcus Rohrbach. “Adversarial Continual Learning.”, In ECCV, 2020.
    [17] Zhizhong Li, Derek Hoiem, “Learning without Forgetting.”, In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.
    [18] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, Christoph H. Lampert, “iCaRL: Incremental Classifier and Representation Learning.”, In CVPR, 2018
    [19] Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, Yun Fu, “Large Scale Incremental Learning.”, In CVPR, 2019
    [20] Arthur Douillard, Matthieu Cord, Charles Ollion, Thomas Robert, Eduardo Valle, “PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning”, in ECCV 2020.
    [21] “CIFAR dataset”, [Online]. Available: https://www.cs.toronto.edu/~kriz/cifar.html. [Accessed July 2021].

    QR CODE
    :::