| 研究生: |
王彥翔 Yen-Hsiang Wang |
|---|---|
| 論文名稱: |
基於知識提煉的單頭式終身學習 Single-head lifelong learning based on distilling knowledge |
| 指導教授: |
施國琛
Timothy K. Shih |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 英文 |
| 論文頁數: | 64 |
| 中文關鍵詞: | 終身式學習 、知識蒸餾 |
| 外文關鍵詞: | Continuous learning |
| 相關次數: | 點閱:13 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
終身式學習相較於其他像是物件偵測、物件追蹤領域,是一個相對比較冷門且新興的領域。這個領域在研究的議題是讓神經網路能夠擁有像人類一樣持續學習的能力,並且能夠利用過往學習的知識,使之在面對未來的任務時,學習起來能夠更加輕鬆、表現更好。
目前終身式學習能夠分為單頭終身式學習與多頭終身式學習,其差別是在測試時是否有任務類別資料,單頭終身式學習並無包含任務類別資訊,因此在分類上需要進行設計來避免資料不平衡導致分類偏好問題;而多頭終身式學習因為其在測試時具有任務類別資訊,不需要處理任務類別分類偏好問題,因此這個子領域主要在研究如何優化模型,使之能夠占用最少的空間。
此篇論文主要圍繞在單頭終身式學習領域,並且採用知識蒸餾策略來避免模型災耐性遺忘的問題。以往的模型蒸餾方式主要著重在模型的輸出層來進行蒸餾,又或是透過中間層的特徵進行歐式距離計算來定義為訓練損失。本文改良了透過歐式距離計算的方法,將特徵經過平均池化層運算,來規避特徵複雜度過高而造成訓練不佳的情形,並且另外再加上一層全連接層作為輸出層,將此組合成一個分支網路,利用分支網路的輸出搭配原本主網路的輸出,來傳承不同尺度特徵的資訊,藉此獲得更佳的效果。
Compared with other fields, such as object detection and tracking, Lifelong Learning is a relatively unpopular and emerging field. The main purpose of this field is to enable neural networks to have the ability to continuously learn like humans, and use the knowledge which was learned in the past to make it easier to learn and perform better when facing future tasks.
At present, Lifelong Learning can be divided into two types: single-head and multi-head Lifelong Learning. The difference between these two types of Lifelong Learning is whether there is task category data during the testing stage or not. Single-head Lifelong Learning doesn’t contain task category information at testing stage, so the model needs to be designed to handle the problem of imbalanced data which causes the classification preference. On the other hand, the task category is accessible at the testing stage under the definition of the multi-head Lifelong Learning. Therefore, there is no need to deal with the task category classification preference problem. The researchers of this subfield are mainly studying how to optimize the model so that it can use the least memory and get the best performance.
In this thesis, we focus on the single-head Lifelong Learning problem, and adopt the knowledge distillation strategy to avoid catastrophic forgetting problems. Previous distillation strategies often use the distribution from the output layers to distill the knowledge, or use the features from intermediate layers to calculate the distillation loss by Euclidean distance. Besides, we propose a branch distillation method. We add an average pooling layer and fully connected layer after we obtain the features from intermediate layers. And then, we use the distribution from this branch network to distill different scales of features in the network. By using this novel knowledge distillation methods, we can improve our model’s performance.
[1] German I. Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, Stefan Wermter, “Continual lifelong learning with neural networks: A review”, in Neural Networks, vol. 113, pp. 54-71, May 2019.
[2] David E. Rumelhart, Geoffrey E. Hinton & Ronald J. Williams, “Learning representations by back-propagating errors”, in Nature vol. 323, no. 6088, pp. 533–536, 1986.
[3] Haibo He, Edwardo A. Garcia, “Learning from Imbalanced Data”, in IEEE vol. 21, no. 9, Sept. 2009.
[4] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition”, in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1 Nov. 1998.
[5] A. Krizhevsky, I. Sutskever, and G. Hinton. “Imagenet classification with deep convolutional neural networks.” In NIPS, 2012.
[6] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition.” In ICLR, 2015.
[7] “VGG16 – Convolutional Network for Classification and Detection”, [Online]. Available: https://neurohive.io/en/popular-networks/vgg16/. [Accessed July 2021].
[8] K. He, X. Zhang, S. Ren, and J. Sun. “Deep residual learning for image recognition.” in CVPR, 2016.
[9] Gregory Koch, Richard Zemel, Ruslan Salakhutdinov, “Siamese neural networks for one-shot image recognition”, in ICML, 2015.
[10] Elad Hoffer, Nir Ailon, “Deep metric learning using Triplet network.”, in ICLR, 2015
[11] Geoffrey Hinton, Oriol Vinyals, Jeff Dean, “Distilling the Knowledge in a Neural Network”, in NIPS, 2015.
[12] Jangho Kim, SeongUk Park, Nojun Kwak, “Paraphrasing Complex Network: Network Compression via Factor Transfer”, in NIPS, 2018.
[13] SeongUk Park, Nojun Kwak, “FEED: Feature-level Ensemble for Knowledge Distillation”, in ECAI, 2020.
[14] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, Raia Hadsell. “Overcoming catastrophic forgetting in neural networks” . In PNAS, 2017.
[15] Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia Hadsell. “Progressive Neural Networks.”, arXiv:1606.04671, 2016.
[16] Sayna Ebrahimi, Franziska Meier, Roberto Calandra, Trevor Darrell, Marcus Rohrbach. “Adversarial Continual Learning.”, In ECCV, 2020.
[17] Zhizhong Li, Derek Hoiem, “Learning without Forgetting.”, In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.
[18] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, Christoph H. Lampert, “iCaRL: Incremental Classifier and Representation Learning.”, In CVPR, 2018
[19] Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, Yun Fu, “Large Scale Incremental Learning.”, In CVPR, 2019
[20] Arthur Douillard, Matthieu Cord, Charles Ollion, Thomas Robert, Eduardo Valle, “PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning”, in ECCV 2020.
[21] “CIFAR dataset”, [Online]. Available: https://www.cs.toronto.edu/~kriz/cifar.html. [Accessed July 2021].