| 研究生: |
黃文城 Dick Hansel Ryan |
|---|---|
| 論文名稱: | Enhance Few-Shot Learning with Transformer Architectures |
| 指導教授: |
孫敏德
Peter Sun |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 人工智慧國際碩士學位學程 International Master's Degree program in Artificial Intelligence |
| 論文出版年: | 2024 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 45 |
| 中文關鍵詞: | 小樣本學習 、深度學習 、密集網路 、多頭注意力機制 |
| 相關次數: | 點閱:37 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著對深度學習模型在小數據集上高效表現需求的提升,小樣本學習( fewshot learning)逐漸成為一個熱門研究領域。其目標是在每個類別只有少量標註樣本的情況下訓練模型,並根據測試數據的處理方式分為歸納式( inductive)與轉導式( transductive)方法。本研究提出了一種基於 Transformer 架構的歸納式小樣本學習模型——DAPNet。該模型結合了密集網路( Dense Networks)與多頭注意力機制( Multi-Head Attention),並改進了激活函數,實現了 Ranger 優化器的應用,有效提升了準確性和訓練效率。我們在MiniImageNet和TieredImageNet 這兩個知名的小樣本學習基準數據集上對 DAPNet 進行了評估。結果顯示, DAPNet 在準確性方面優於或媲美當前的先進模型。
With the growing demand for deep learning models to excel on limited datasets, few-shot learning has gained prominence as a promising area of research. Its goal is to train models using only a few labeled examples per class. Depending on
how test data is processed, few-shot learning methods are classified into inductive and transductive approaches. In this work, we present DAPNet, an inductive fewshot learning model based on the Transformer architecture. Our model incorporates Dense Networks and Multi-Head Attention, alongside modifications to the activation function and the implementation of the Ranger optimizer, which lead to enhanced accuracy and training efficiency. We evaluate DAPNet on two widely recognized benchmark datasets for few-shot learning: MiniImageNet and TieredImageNet. The experimental results show that DAPNet delivers outstanding performance, either exceeding or matching the accuracy of state-of-the-art models.
[1] Abien Fred Agarap. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375, 2018.
[2] Peyman Bateni, Jarred Barber, Jan-Willem van de Meent, and Frank Wood. Enhancing few-shot image classification with unlabelled examples. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages
2796–2805, January 2022.
[3] Yassir Bendou, Yuqing Hu, Raphael Lafargue, Giulia Lioi, Bastien Pasdeloup, St´ephane Pateux, and Vincent Gripon. Easy—ensemble augmented-shot-y-shaped learning: State-of-the-art few-shot classification with simple components. Journal of Imaging, 8(7):179, 2022.
[4] Luca Bertinetto, Joao F Henriques, Philip HS Torr, and Andrea Vedaldi. Metalearning with differentiable closed-form solvers. arXiv preprint arXiv:1805.08136, 2018.
[5] Malik Boudiaf, Etienne Bennequin, Myriam Tami, Antoine Toubhans, Pablo Piantanida, Celine Hudelot, and Ismail Ben Ayed. Open-set likelihood maximization for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24007–24016, 2023.
[6] Malik Boudiaf, Imtiaz Ziko, J´erˆome Rony, Jos´e Dolz, Pablo Piantanida, and Ismail Ben Ayed. Information maximization for few-shot learning. Advances in Neural
Information Processing Systems, 33:2445–2457, 2020.
[7] Da Chen, Yuefeng Chen, Yuhong Li, Feng Mao, Yuan He, and Hui Xue. Self-supervised learning for few-shot image classification. arXiv preprint arXiv:1911.06045, 2019.
[8] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[9] Stefan Elfwing, Eiji Uchibe, and Kenji Doya. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural networks, 107:3– 11, 2018.
[10] Christopher Fifty, Dennis Duan, Ronald Guenther Junkins, Ehsan Amid, Jure Leskovec, Christopher Re, and Sebastian Thrun. Context-aware meta-learning. In The Twelfth International Conference on Learning Representations, 2023.
[11] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
[12] Fusheng Hao, Fengxiang He, Liu Liu, Fuxiang Wu, Dacheng Tao, and Jun Cheng. Class-aware patch embedding adaptation for few-shot image classification. 2023.
[13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[14] Yangji He, Weihan Liang, Dongyang Zhao, Hong-Yu Zhou, Weifeng Ge, Yizhou Yu, and Wenqiang Zhang. Attribute surrogates learning and spectral tokens pooling in transformers for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9119–9129, 2022.
[15] Markus Hiller, Rongkai Ma, Mehrtash Harandi, and Tom Drummond. Rethinking generalization in few-shot classification. In Alice H. Oh, Alekh Agarwal, Danielle
Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems (NeurIPS), 2022.
[16] Shell Xu Hu, Da Li, Jan St¨uhmer, Minyoung Kim, and Timothy M. Hospedales. Pushing the limits of simple pipelines for few-shot learning: External data and finetuning make a difference. In CVPR, 2022.
[17] Yuqing Hu, Vincent Gripon, and St´ephane Pateux. Leveraging the feature distribution in transfer-based few-shot learning. In International Conference on Artificial
Neural Networks, pages 487–499. Springer, 2021.
[18] Yuqing Hu, St´ephane Pateux, and Vincent Gripon. Adaptive dimension reduction and variational inference for transductive few-shot classification. In International
Conference on Artificial Intelligence and Statistics, pages 5899–5917. PMLR, 2023.
[19] Yuqing Hu, St´ephane Pateux, and Vincent Gripon. Squeezing backbone feature distributions to the max for efficient few-shot learning. Algorithms, 15(5), 2022.
[20] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference
on computer vision and pattern recognition, pages 4700–4708, 2017.
[21] Suhyun Kang, Duhun Hwang, Moonjung Eo, Taesup Kim, and Wonjong Rhee. Metalearning with a geometry-adaptive preconditioner. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), pages 16080– 16090, June 2023.
[22] Gregory Koch, Richard Zemel, Ruslan Salakhutdinov, et al. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, volume 2. Lille, 2015.
[23] Alex Krizhevsky. Learning multiple layers of features from tiny images, 2009.
[24] Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–
1338, 2015.
[25] Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265, 2020.
[26] Lu Lu, Yeonjong Shin, Yanhui Su, and George Em Karniadakis. Dying relu and initialization: Theory and numerical examples. arXiv preprint arXiv:1903.06733,
2019.
[27] Boris Oreshkin, Pau Rodr´ıguez L´opez, and Alexandre Lacoste. Tadam: Task dependent adaptive metric for improved few-shot learning. Advances in neural information
processing systems, 31, 2018.
[28] Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B Tenenbaum, Hugo Larochelle, and Richard S Zemel. Meta-learning for semisupervised few-shot classification. arXiv preprint arXiv:1803.00676, 2018.
[29] Pau Rodr´ıguez, Issam Laradji, Alexandre Drouin, and Alexandre Lacoste. Embedding propagation: Smoother manifold for few-shot classification. arXiv preprint arXiv:2003.04151, 2020.
[30] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 15(3):211–252, 2015.
[31] Daniel Shalam and Simon Korman. The self-optimal-transport feature transform. arXiv preprint arXiv:2204.03065, 2022.
[32] Daniel Shalam and Simon Korman. The balanced-pairwise-affinities feature transform. arXiv preprint arXiv:2407.01467, 2024.
[33] Claude Elwood Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27:379–423, 1948.
[34] Anuj Rajeeva Singh and Hadi Jamali-Rad. Transductive decoupled variational inference for few-shot classification. Transactions on Machine Learning Research, 2023.
[35] Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
[36] Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. Meta-transfer learning for few-shot learning. In IEEE Conference on Computer Vision and Pattern
Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 403–412. Computer Vision Foundation / IEEE, 2019.
[37] Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1199–1208, 2018.
[38] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the
IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
[39] Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herve Jegou. Training data-efficient image transformers & distillation
through attention. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 10347–10357. PMLR, 18–24 Jul 2021.
[40] Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, and Herv´e J´egou. Going deeper with image transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 32–42, 2021.
[41] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in
neural information processing systems, 30, 2017.
[42] Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. Advances in neural information processing systems,
29, 2016.
[43] Yikai Wang, Li Zhang, Yuan Yao, and Yanwei Fu. How to trust unlabeled data? instance credibility inference for few-shot learning. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 2021.
[44] Less Wright and Nestor Demeure. Ranger21: a synergistic deep learning optimizer. arXiv preprint arXiv:2106.13731, 2021.
[45] Haipeng Zhang, Zhong Cao, Ziang Yan, and Changshui Zhang. Sill-net: Feature augmentation with separated illumination representation. arXiv preprint arXiv:2102.03539, 2021.
[46] Michael R Zhang, James Lucas, Jimmy Ba, and Geoffrey Hinton. Lookahead optimizer: k steps forward, 1 step back. In Advances in Neural Information Processing Systems, pages 9597–9608, 2019.
[47] Kaipeng Zheng, Huishuai Zhang, and Weiran Huang. Diffkendall: A novel approach for few-shot learning with differentiable kendall’s rank correlation. Advances in Neural Information Processing Systems, 36:49403–49415, 2023.