| 研究生: |
王承凱 Cheng-Kai Wang |
|---|---|
| 論文名稱: |
利用 SCPL 分解端到端倒傳遞演算法 Decomposing End-to-End Backpropagation Based on SCPL |
| 指導教授: |
陳弘軒
Hung-Hsuan Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 軟體工程研究所 Graduate Institute of Software Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 42 |
| 中文關鍵詞: | 倒傳遞 、反向鎖定 、監督對比損失函數 、平行化訓練 、監督式 對比平行學習 |
| 外文關鍵詞: | Backpropagation, backward locking, supervised contrastive loss, parallel learning, supervised contrastive parallel learning |
| 相關次數: | 點閱:20 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
倒傳遞 (Backpropagation, BP) 是當今深度神經網路更新權重演算法
的基石,但反向傳播因反向鎖定 (backward locking) 的問題而效率不佳。
本研究試圖解決反向鎖定問題,並將提出的新方法命名為 Supervised
Contrastive Parallel Learning (SCPL),SCPL 利用監督對比損失函數作為每個卷積層的區域目標函數,因為每一層的區域目標函數間互相隔離,
SCPL 可以平行地學習不同卷基層的權重。
本論文亦和過去在神經網路平行化的研究進行比較,探討現存方法
各自的優勢與限制,並討論此議題未來的研究方向。
Backpropagation (BP) is the cornerstone of today’s deep learning algorithms to update the weights in deep neural networks, but it is inefficient partially because of the backward locking problem. This thesis proposes Supervised Contrastive Parallel Learning (SCPL) to address the issue of backward locking. SCPL uses the supervised contrastive loss as the local objective function for each layer. Because the local objective functions in different layers are isolated, SCPL can learn the weights of different layers in parallel. We compare SCPL with recent works on neural network parallelization. We discuss the advantages and limitations of the existing methods. Finally, we suggest future research directions on neural network parallelization.
[1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by
back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.
[2] M. Jaderberg, W. M. Czarnecki, S. Osindero, et al., “Decoupled neural interfaces
using synthetic gradients,” in International conference on machine learning, PMLR,
2017, pp. 1627–1635.
[3] Y.-W. Kao and H.-H. Chen, “Associated learning: Decomposing end-to-end backpropagation based on autoencoders and target propagation,” Neural Computation,
vol. 33, no. 1, pp. 174–193, 2021.
[4] D. Y. Wu, D. Lin, V. Chen, and H.-H. Chen, “Associated learning: An alternative to end-to-end backpropagation that works on cnn, rnn, and transformer,” in
International Conference on Learning Representations, 2021.
[5] A. Nøkland and L. H. Eidnes, “Training neural networks with local error signals,”
in International conference on machine learning, PMLR, 2019, pp. 4839–4850.
[6] S. Teerapittayanon, B. McDanel, and H.-T. Kung, “Branchynet: Fast inference via
early exiting from deep neural networks,” in 2016 23rd International Conference
on Pattern Recognition (ICPR), IEEE, 2016, pp. 2464–2469.
[7] H. Mostafa, V. Ramesh, and G. Cauwenberghs, “Deep supervised learning using
local errors,” Frontiers in neuroscience, p. 608, 2018.
[8] P. Khosla, P. Teterwak, C. Wang, et al., “Supervised contrastive learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 661–18 673, 2020.
[9] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied
to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324,
1998.
[10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in Proceedings of the IEEE conference on computer vision and pattern recognition,
2016, pp. 770–778.
[11] C. J. Shallue, J. Lee, J. Antognini, J. Sohl-Dickstein, R. Frostig, and G. E. Dahl,
“Measuring the effects of data parallelism on neural network training,” arXiv preprint
arXiv:1811.03600, 2018.
[12] T. Vogels, S. P. Karimireddy, and M. Jaggi, “Powersgd: Practical low-rank gradient compression for distributed optimization,” Advances in Neural Information
Processing Systems, vol. 32, 2019.
[13] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, 2020, pp. 9729–9738.
[14] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine
learning, PMLR, 2020, pp. 1597–1607.
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[16] A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny
images,” 2009.
[17] Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7,
no. 7, p. 3, 2015.
[18] S. Garg, S. Balakrishnan, Z. Kolter, and Z. Lipton, “Ratt: Leveraging unlabeled
data to guarantee generalization,” in International Conference on Machine Learning, PMLR, 2021, pp. 3598–3609.
[19] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2:
Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2018, pp. 4510–4520.
[20] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural
networks,” in International conference on machine learning, PMLR, 2019, pp. 6105–
6114.