關聯式學習：利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法

簡易檢索 / 詳目顯示

回結果列表

研究生：	高聿緯 Yu-Wei Kao
論文名稱：	關聯式學習：利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法 Associated Learning: Decomposing End-to-end Backpropagation based on Auto-encoders and Target Propagation
指導教授：	陳弘軒 Hung-Hsuan Chen
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2019
畢業學年度：	108
語文別：	英文
論文頁數：	48
中文關鍵詞：	生物合理性演算法、深度學習、平行運算、模組化
外文關鍵詞：	Biologically plausible algorithm, Deep learning, Parallel computing, Modularization
相關次數：	點閱：10 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

倒傳遞演算法已被廣泛的運用在深度學習上，但因為有傳遞鎖與梯度
消失、爆炸的問題，它不是有效率且穩定的演算法，在較深的網路架構
更可以觀察到這些現象。此外，單單只用一個目標來更新神經網路中的
參數在生物學來說並非合理的。
在本篇論文中，我們提出了一種新穎且受生物學啟發的學習架構，名
為「關聯式學習」，這個訓練方式將原有的神經網路模組化成小單元，每
個小單元都有自己的局部目標，又因為這些單元兩兩獨立，關聯式學習
能夠獨立且同時訓練彼此獨立的參數。
令人驚訝的是，利用關聯式學習訓練的準確度，也能與直接使用目標
訓練的傳統倒傳遞演算法相當，此外，可能是因為模組內的梯度流較短，
關聯式學習也能訓練用sigmoid 當作活化函數的深度學習網路，然而若
是用倒傳遞演算法訓練這類網路會容易導致梯度消失。
我們也透過觀察隱藏層中的類間與類內距離，以及t-SNE 來呈
現數量上與品質上的結果，發現聯想式學習能夠生成更好的間特徵
(Metafeatures)。

Backpropagation has been widely used in deep learning approaches, but
it is inefficient and sometimes unstable because of backward locking and
vanishing/exploding gradient problems, especially when the gradient flow
is long. Additionally, updating all edge weights based on a single objective
seems biologically implausible. In this paper, we introduce a novel biologically
motivated learning structure called Associated Learning, which
modularizes the network into smaller components, each of which has a local
objective. Because the objectives are mutually independent, Associated
Learning can learn the parameters independently and simultaneously when
these parameters belong to different components. Surprisingly, training
deep models by Associated Learning yields comparable accuracies to models
trained using typical backpropagation methods, which aims at fitting
the target variable directly. Moreover, probably because the gradient flow
of each component is short, deep networks can still be trained with Associated
Learning even when some of the activation functions are sigmoid—a
situation that usually results in the vanishing gradient problem when using
typical backpropagation. We also found that the Associated Learning generates better metafeatures, which we demonstrated both quantitatively
(via inter-class and intra-class distance comparisons in the hidden layers)
and qualitatively (by visualizing the hidden layers using t-SNE).

摘要                                                 iv
Abstract                                             v
Contents                                             vii
List of Figures                                      ix
List of Tables                                       xii
Introduction                                       1
Methodology                                        4
1 Preliminaries....................................4
1.1 Artificial Neural Network..................... 4
1.2 Backpropagation............................... 5
1.3 Models........................................ 5
2 Motivation...................................... 8
3 Associated Loss of Associated Learning.......... 9
4 Inverse Loss of Associated Learning............. 10
5 Bridge of Associated Learning................... 11
6 Effective Parameters and Hypothesis Space....... 11
Experiments                                        13
1 Datasets........................................ 14
1.1 MNIST......................................... 14
1.2 CIFAR ........................................ 16
2 Testing Accuracy ............................... 17
2.1 MNIST......................................... 17
2.2 CIFAR-10 ..................................... 19
2.3 CIFAR-100..................................... 20
3 Metafeature Visualization and Quantification ... 21
Related Work                                       25
Discussion and future works                        29
Bibliography                                         31
A Source Code                                        34
A.1 Code link ...................................... 34
A.2 Usage........................................... 34
                                

[1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by
back-propagating errors,” Nature, vol. 323, no. 6088, p. 533, 1986.
[2] M. Jaderberg, W. M. Czarnecki, S. Osindero, O. Vinyals, A. Graves, D. Silver,
and K. Kavukcuoglu, “Decoupled neural interfaces using synthetic gradients,” in
Proceedings of the 34th International Conference on Machine Learning-Volume 70,
pp. 1627–1635, JMLR. org, 2017.
[3] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, “Gradient flow in recurrent
nets: The difficulty of learning long-term dependencies,” in A Field Guide to
Dynamical Recurrent Neural Networks (S. C. Kremer and J. F. Kolen, eds.), IEEE
Press, 2001.
[4] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in
Proceedings of the fourteenth International Conference on Artificial Intelligence and
Statistics, pp. 315–323, 2011.
[5] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation,
vol. 9, no. 8, pp. 1735–1780, 1997.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 770–778, 2016.
[7] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training
by reducing internal covariate shift,” in Proceedings of the 32Nd International Conference
on International Conference on Machine Learning - Volume 37, ICML’15,
pp. 448–456, JMLR.org, 2015.
[8] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent
neural networks,” in International Conference on Machine Learning, pp. 1310–1318,
2013.
[9] F. Crick, “The recent excitement about neural networks.,” Nature, vol. 337, no. 6203,
pp. 129–132, 1989.
[10] D. Balduzzi, H. Vanchinathan, and J. M. Buhmann, “Kickback cuts backprop’s
red-tape: Biologically plausible credit assignment in neural networks.,” in AAAI,
pp. 485–491, 2015.
[11] T. P. Lillicrap, D. Cownden, D. B. Tweed, and C. J. Akerman, “Random synaptic
feedback weights support error backpropagation for deep learning,” Nature Communications,
vol. 7, p. 13276, 2016.
[12] A. Nøkland, “Direct feedback alignment provides learning in deep neural networks,”
in Advances in Neural Information Processing Systems, pp. 1037–1045, 2016.
[13] A. G. Ororbia, A. Mali, D. Kifer, and C. L. Giles, “Conducting credit assignment by
aligning local representations,” arXiv preprint arXiv:1803.01834, 2018.
[14] A. G. Ororbia and A. Mali, “Biologically motivated algorithms for propagating local
target representations,” arXiv preprint arXiv:1805.11703, 2018.
[15] S. Bartunov, A. Santoro, B. Richards, L. Marris, G. E. Hinton, and T. Lillicrap,
“Assessing the scalability of biologically-motivated deep learning algorithms and architectures,”
in Advances in Neural Information Processing Systems, pp. 9390–9400,
2018.
[16] A. Nøkland and L. H. Eidnes, “Training neural networks with local error signals.,” in
ICML, vol. 97 of Proceedings of Machine Learning Research, pp. 4839–4850, PMLR,
2019.
[17] Y. Bengio, “How auto-encoders could provide credit assignment in deep networks
via target propagation,” arXiv preprint arXiv:1407.7906, 2014.
[18] D.-H. Lee, S. Zhang, A. Fischer, and Y. Bengio, “Difference target propagation,”
in Joint European Conference on Machine Learning and Knowledge Discovery in
Databases, pp. 498–515, Springer, 2015.
[19] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine
Learning Research, vol. 9, no. Nov, pp. 2579–2605, 2008.
[20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition.,” in ICLR (Y. Bengio and Y. LeCun, eds.), 2015.
[21] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied
to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324,
1998.
[22] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,”
tech. rep., Citeseer, 2009.
[23] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in
Advances in Neural Information Processing Systems, pp. 3856–3866, 2017.
[24] M. Michael and W.-C. Lin, “Experimental study of information measure and interintra
class distance ratios on feature selection and orderings,” IEEE Transactions on
Systems, Man, and Cybernetics, no. 2, pp. 172–181, 1973.
[25] Y. Luo, Y. Wong, M. Kankanhalli, and Q. Zhao, “G-softmax: Improving intraclass
compactness and interclass separability of features,” IEEE Transactions on Neural
Networks and Learning Systems, 2019.
[26] Y. Bengio, D.-H. Lee, J. Bornschein, T. Mesnard, and Z. Lin, “Towards biologically
plausible deep learning,” arXiv preprint arXiv:1502.04156, 2015.
[27] G. Taylor, R. Burmeister, Z. Xu, B. Singh, A. Patel, and T. Goldstein, “Training
neural networks without gradients: A scalable admm approach,” in International
Conference on Machine Learning, pp. 2722–2731, 2016.
[28] Z. Huo, B. Gu, Q. Yang, and H. Huang, “Decoupled parallel backpropagation with
convergence guarantee.,” in ICML, vol. 80 of Proceedings of Machine Learning Research,
pp. 2103–2111, PMLR, 2018.
[29] Z. Huo, B. Gu, and H. Huang, “Training neural networks using features replay,” in
Advances in Neural Information Processing Systems, pp. 6659–6668, 2018.
[30] H. Mostafa, V. Ramesh, and G. Cauwenberghs, “Deep supervised learning using
local errors,” Frontiers in Neuroscience, vol. 12, p. 608, 2018.
[31] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix
factorization,” Nature, vol. 401, no. 6755, p. 788, 1999.
[32] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, “Self-taught learning: transfer
learning from unlabeled data,” in Proceedings of the 24th International Conference
on Machine Learning, pp. 759–766, ACM, 2007.
[33] A. Coates and A. Y. Ng, “Selecting receptive fields in deep networks,” in Advances
in Neural Information Processing Systems, pp. 2528–2536, 2011.
[34] P. Baldi, “Autoencoders, unsupervised learning, and deep architectures,” in Proceedings
of ICML Workshop on Unsupervised and Transfer Learning, pp. 37–49, 2012.

簡易檢索 / 詳目顯示

相關論文