利用 SCPL 分解端到端倒傳遞演算法｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王承凱 Cheng-Kai Wang
論文名稱：	利用 SCPL 分解端到端倒傳遞演算法 Decomposing End-to-End Backpropagation Based on SCPL
指導教授：	陳弘軒 Hung-Hsuan Chen
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 軟體工程研究所 Graduate Institute of Software Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	42
中文關鍵詞：	倒傳遞、反向鎖定、監督對比損失函數、平行化訓練、監督式對比平行學習
外文關鍵詞：	Backpropagation, backward locking, supervised contrastive loss, parallel learning, supervised contrastive parallel learning
相關次數：	點閱：20 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

倒傳遞 (Backpropagation, BP) 是當今深度神經網路更新權重演算法
的基石，但反向傳播因反向鎖定 (backward locking) 的問題而效率不佳。
本研究試圖解決反向鎖定問題，並將提出的新方法命名為 Supervised
Contrastive Parallel Learning (SCPL)，SCPL 利用監督對比損失函數作為每個卷積層的區域目標函數，因為每一層的區域目標函數間互相隔離，
SCPL 可以平行地學習不同卷基層的權重。
本論文亦和過去在神經網路平行化的研究進行比較，探討現存方法
各自的優勢與限制，並討論此議題未來的研究方向。

Backpropagation (BP) is the cornerstone of today’s deep learning algorithms to update the weights in deep neural networks, but it is inefficient partially because of the backward locking problem. This thesis proposes Supervised Contrastive Parallel Learning (SCPL) to address the issue of backward locking. SCPL uses the supervised contrastive loss as the local objective function for each layer. Because the local objective functions in different layers are isolated, SCPL can learn the weights of different layers in parallel. We compare SCPL with recent works on neural network parallelization. We discuss the advantages and limitations of the existing methods. Finally, we suggest future research directions on neural network parallelization.

摘要 v
Abstract vi
致謝 vii
目錄 viii
一、 緒論 1
二、 相關研究 4
三、 研究模型及方法 6
3.1 對比學習的機制 ......................................................... 6
3.2 監督對比損失函數 ...................................................... 8
3.3 學習機制與網路架構 ................................................... 9
3.4 推論函數及假設空間 ................................................... 11
3.5 與其他方法比較 ......................................................... 11
3.6 模型虛擬碼 ............................................................... 12
四、 實驗結果與分析 14
4.1 實驗設定與實作細節 ................................................... 14
4.1.1 實驗設定 ......................................................... 14
4.1.2 實作細節 ......................................................... 14
4.2 分類任務準確率 ......................................................... 17
4.2.1 CIFAR-10 ........................................................ 17
4.2.2 CIFAR-100....................................................... 18
4.2.3 TinyImageNet-val .............................................. 18
4.3 泛化能力測試 ............................................................ 19
4.4 消融實驗 .................................................................. 21
4.4.1 資料擴增 ......................................................... 21
4.4.2 批次大小 ......................................................... 22
4.4.3 投影頭 ............................................................ 23
4.5 討論 ........................................................................ 24
4.5.1 方法比較與分析 ................................................ 24
4.5.2 問題探討 ......................................................... 25
五、 總結 27
5.1 結論 ........................................................................ 27
5.2 未來展望 .................................................................. 28
參考文獻 29
附錄 A 實驗程式碼 31
                                

[1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by
back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.
[2] M. Jaderberg, W. M. Czarnecki, S. Osindero, et al., “Decoupled neural interfaces
using synthetic gradients,” in International conference on machine learning, PMLR,
2017, pp. 1627–1635.
[3] Y.-W. Kao and H.-H. Chen, “Associated learning: Decomposing end-to-end backpropagation based on autoencoders and target propagation,” Neural Computation,
vol. 33, no. 1, pp. 174–193, 2021.
[4] D. Y. Wu, D. Lin, V. Chen, and H.-H. Chen, “Associated learning: An alternative to end-to-end backpropagation that works on cnn, rnn, and transformer,” in
International Conference on Learning Representations, 2021.
[5] A. Nøkland and L. H. Eidnes, “Training neural networks with local error signals,”
in International conference on machine learning, PMLR, 2019, pp. 4839–4850.
[6] S. Teerapittayanon, B. McDanel, and H.-T. Kung, “Branchynet: Fast inference via
early exiting from deep neural networks,” in 2016 23rd International Conference
on Pattern Recognition (ICPR), IEEE, 2016, pp. 2464–2469.
[7] H. Mostafa, V. Ramesh, and G. Cauwenberghs, “Deep supervised learning using
local errors,” Frontiers in neuroscience, p. 608, 2018.
[8] P. Khosla, P. Teterwak, C. Wang, et al., “Supervised contrastive learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 661–18 673, 2020.
[9] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied
to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324,
1998.
[10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in Proceedings of the IEEE conference on computer vision and pattern recognition,
2016, pp. 770–778.
[11] C. J. Shallue, J. Lee, J. Antognini, J. Sohl-Dickstein, R. Frostig, and G. E. Dahl,
“Measuring the effects of data parallelism on neural network training,” arXiv preprint
arXiv:1811.03600, 2018.
[12] T. Vogels, S. P. Karimireddy, and M. Jaggi, “Powersgd: Practical low-rank gradient compression for distributed optimization,” Advances in Neural Information
Processing Systems, vol. 32, 2019.
[13] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, 2020, pp. 9729–9738.
[14] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine
learning, PMLR, 2020, pp. 1597–1607.
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[16] A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny
images,” 2009.
[17] Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7,
no. 7, p. 3, 2015.
[18] S. Garg, S. Balakrishnan, Z. Kolter, and Z. Lipton, “Ratt: Leveraging unlabeled
data to guarantee generalization,” in International Conference on Machine Learning, PMLR, 2021, pp. 3598–3609.
[19] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2:
Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2018, pp. 4510–4520.
[20] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural
networks,” in International conference on machine learning, PMLR, 2019, pp. 6105–
6114.

簡易檢索 / 詳目顯示

相關論文