| 研究生: |
楊緣智 Yuan-Chih Yang |
|---|---|
| 論文名稱: |
藉由權重之梯度大小調整DropConnect的捨棄機率來訓練神經網路 Training a neural network by adjusting the drop probability in DropConnect based on the magnitude of the gradient |
| 指導教授: |
陳弘軒
Hung-Hsuan Chen |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 69 |
| 中文關鍵詞: | 過度擬合、正則化、Dropout、DropConnect、泛化 |
| 外文關鍵詞: | Overfitting, Regularization, Dropout, DropConnect, Generalization |
| 相關次數: | 點閱:11 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在深度學習訓練中,Dropout 和 DropConnect 是常被用來解決過度
擬合的正則化技術,Dropout 和 DropConnect 藉由在訓練過程中以一個
固定機率隨機地捨棄神經元及該神經元前後的連結,使得每個神經元彼
此之間不會過度依賴其他神經元,進而提高模型泛化的能力。
本文提出了一種新模型 Gradient DropConnect,它利用每個權重和
偏差的梯度以確定它們在訓練期間的下降捨棄機率。我們進行了一連串
的實驗以驗證這種方法可以有效地緩解過度擬合。
Dropout and DropConnect are regularization techniques often used to address the overfitting issue in deep learning. Dropout and DropConnect randomly discard neurons or links with a fixed probability during training
so that each neuron does not depend too much on other neurons, thereby improving the model’s generalization ability.
This paper proposes a new model, Gradient DropConnect, which leverages the gradient of each weight and bias to determine their dropping probabilities during training. We conducted thorough experiments to validate that such an approach can effectively mitigate overfitting.
參考文獻
[1] “Sppmg/TW_thesis_template,” GitHub. (), [Online]. Available: https://github.
com/sppmg/TW_Thesis_Template (visited on 10/23/2016).
[2] D. M. Hawkins, “The problem of overfitting,” Journal of chemical information and computer sc
vol. 44, no. 1, pp. 1–12, 2004.
[3] F. Girosi, M. Jones, and T. Poggio, “Regularization theory and neural networks
architectures,” Neural computation, vol. 7, no. 2, pp. 219–269, 1995.
[4] O. Bousquet and A. Elisseeff, “Stability and generalization,” The Journal of Machine Learning
vol. 2, pp. 499–526, 2002.
[5] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdi-
nov, “Improving neural networks by preventing co-adaptation of feature detectors,”
arXiv preprint arXiv:1207.0580, 2012.
[6] L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus, “Regularization of neu-
ral networks using dropconnect,” in International conference on machine learning,
PMLR, 2013, pp. 1058–1066.
[7] J. V. Uspensky, “Introduction to mathematical probability,” 1937.
[8] J. Ba and B. Frey, “Adaptive dropout for training deep neural networks,” Advances in neural in
vol. 26, 2013.
[9] P. Galeone, Analysis of dropout, Jan. 2017. [Online]. Available: https://pgaleone.
eu/deep-learning/regularization/2017/01/10/anaysis-of-dropout/.
[10] A. Jain, K. Nandakumar, and A. Ross, “Score normalization in multimodal bio-
metric systems,” Pattern recognition, vol. 38, no. 12, pp. 2270–2285, 2005.
[11] D. G. Altman and J. M. Bland, “Statistics notes: The normal distribution,” Bmj,
vol. 310, no. 6975, p. 298, 1995.
[12] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied
to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324,
1998.
[13] A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny
images,” 2009.
51
[14] Y. LeCun, F. J. Huang, and L. Bottou, “Learning methods for generic object recog-
nition with invariance to pose and lighting,” in Proceedings of the 2004 IEEE Computer Society
IEEE, vol. 2, 2004, pp. II–104.
[15] L. Bottou, “Stochastic gradient descent tricks,” in Neural networks: Tricks of the trade,
Springer, 2012, pp. 421–436.
[16] Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it? a new look at
signal fidelity measures,” IEEE signal processing magazine, vol. 26, no. 1, pp. 98–
117, 2009.
[17] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:171
2017.
[18] ——, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983,
2016.
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
convolutional neural networks,” Advances in neural information processing systems,
vol. 25, 2012.
[20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[21] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthog-
onal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.
[22] P. Juszczak, D. Tax, and R. P. Duin, “Feature scaling in support vector data
description,” in Proc. asci, Citeseer, 2002, pp. 95–102.
[23] P. Refaeilzadeh, L. Tang, and H. Liu, “Cross-validation.,” Encyclopedia of database systems,
vol. 5, pp. 532–538, 2009.
[24] L. Prechelt, “Early stopping-but when?” In Neural Networks: Tricks of the trade,
Springer, 1998, pp. 55–69.
[25] D. A. Van Dyk and X.-L. Meng, “The art of data augmentation,” Journal of Computational and
vol. 10, no. 1, pp. 1–50, 2001.
[26] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistic
vol. 58, no. 1, pp. 267–288, 1996.
52