跳到主要內容

簡易檢索 / 詳目顯示

研究生: 高聿緯
Yu-Wei Kao
論文名稱: 關聯式學習:利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法
Associated Learning: Decomposing End-to-end Backpropagation based on Auto-encoders and Target Propagation
指導教授: 陳弘軒
Hung-Hsuan Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2019
畢業學年度: 108
語文別: 英文
論文頁數: 48
中文關鍵詞: 生物合理性演算法深度學習平行運算模組化
外文關鍵詞: Biologically plausible algorithm, Deep learning, Parallel computing, Modularization
相關次數: 點閱:10下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 倒傳遞演算法已被廣泛的運用在深度學習上,但因為有傳遞鎖與梯度
    消失、爆炸的問題,它不是有效率且穩定的演算法,在較深的網路架構
    更可以觀察到這些現象。此外,單單只用一個目標來更新神經網路中的
    參數在生物學來說並非合理的。
    在本篇論文中,我們提出了一種新穎且受生物學啟發的學習架構,名
    為「關聯式學習」,這個訓練方式將原有的神經網路模組化成小單元,每
    個小單元都有自己的局部目標,又因為這些單元兩兩獨立,關聯式學習
    能夠獨立且同時訓練彼此獨立的參數。
    令人驚訝的是,利用關聯式學習訓練的準確度,也能與直接使用目標
    訓練的傳統倒傳遞演算法相當,此外,可能是因為模組內的梯度流較短,
    關聯式學習也能訓練用sigmoid 當作活化函數的深度學習網路,然而若
    是用倒傳遞演算法訓練這類網路會容易導致梯度消失。
    我們也透過觀察隱藏層中的類間與類內距離,以及t-SNE 來呈
    現數量上與品質上的結果,發現聯想式學習能夠生成更好的間特徵
    (Metafeatures)。


    Backpropagation has been widely used in deep learning approaches, but
    it is inefficient and sometimes unstable because of backward locking and
    vanishing/exploding gradient problems, especially when the gradient flow
    is long. Additionally, updating all edge weights based on a single objective
    seems biologically implausible. In this paper, we introduce a novel biologically
    motivated learning structure called Associated Learning, which
    modularizes the network into smaller components, each of which has a local
    objective. Because the objectives are mutually independent, Associated
    Learning can learn the parameters independently and simultaneously when
    these parameters belong to different components. Surprisingly, training
    deep models by Associated Learning yields comparable accuracies to models
    trained using typical backpropagation methods, which aims at fitting
    the target variable directly. Moreover, probably because the gradient flow
    of each component is short, deep networks can still be trained with Associated
    Learning even when some of the activation functions are sigmoid—a
    situation that usually results in the vanishing gradient problem when using
    typical backpropagation. We also found that the Associated Learning generates better metafeatures, which we demonstrated both quantitatively
    (via inter-class and intra-class distance comparisons in the hidden layers)
    and qualitatively (by visualizing the hidden layers using t-SNE).

    摘要 iv Abstract v Contents vii List of Figures ix List of Tables xii 1 Introduction 1 2 Methodology 4 2.1 Preliminaries....................................4 2.1.1 Artificial Neural Network..................... 4 2.1.2 Backpropagation............................... 5 2.1.3 Models........................................ 5 2.2 Motivation...................................... 8 2.3 Associated Loss of Associated Learning.......... 9 2.4 Inverse Loss of Associated Learning............. 10 2.5 Bridge of Associated Learning................... 11 2.6 Effective Parameters and Hypothesis Space....... 11 3 Experiments 13 3.1 Datasets........................................ 14 3.1.1 MNIST......................................... 14 3.1.2 CIFAR ........................................ 16 3.2 Testing Accuracy ............................... 17 3.2.1 MNIST......................................... 17 3.2.2 CIFAR-10 ..................................... 19 3.2.3 CIFAR-100..................................... 20 3.3 Metafeature Visualization and Quantification ... 21 4 Related Work 25 5 Discussion and future works 29 Bibliography 31 A Source Code 34 A.1 Code link ...................................... 34 A.2 Usage........................................... 34

    [1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by
    back-propagating errors,” Nature, vol. 323, no. 6088, p. 533, 1986.
    [2] M. Jaderberg, W. M. Czarnecki, S. Osindero, O. Vinyals, A. Graves, D. Silver,
    and K. Kavukcuoglu, “Decoupled neural interfaces using synthetic gradients,” in
    Proceedings of the 34th International Conference on Machine Learning-Volume 70,
    pp. 1627–1635, JMLR. org, 2017.
    [3] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, “Gradient flow in recurrent
    nets: The difficulty of learning long-term dependencies,” in A Field Guide to
    Dynamical Recurrent Neural Networks (S. C. Kremer and J. F. Kolen, eds.), IEEE
    Press, 2001.
    [4] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in
    Proceedings of the fourteenth International Conference on Artificial Intelligence and
    Statistics, pp. 315–323, 2011.
    [5] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation,
    vol. 9, no. 8, pp. 1735–1780, 1997.
    [6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
    in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
    pp. 770–778, 2016.
    [7] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training
    by reducing internal covariate shift,” in Proceedings of the 32Nd International Conference
    on International Conference on Machine Learning - Volume 37, ICML’15,
    pp. 448–456, JMLR.org, 2015.
    [8] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent
    neural networks,” in International Conference on Machine Learning, pp. 1310–1318,
    2013.
    [9] F. Crick, “The recent excitement about neural networks.,” Nature, vol. 337, no. 6203,
    pp. 129–132, 1989.
    [10] D. Balduzzi, H. Vanchinathan, and J. M. Buhmann, “Kickback cuts backprop’s
    red-tape: Biologically plausible credit assignment in neural networks.,” in AAAI,
    pp. 485–491, 2015.
    [11] T. P. Lillicrap, D. Cownden, D. B. Tweed, and C. J. Akerman, “Random synaptic
    feedback weights support error backpropagation for deep learning,” Nature Communications,
    vol. 7, p. 13276, 2016.
    [12] A. Nøkland, “Direct feedback alignment provides learning in deep neural networks,”
    in Advances in Neural Information Processing Systems, pp. 1037–1045, 2016.
    [13] A. G. Ororbia, A. Mali, D. Kifer, and C. L. Giles, “Conducting credit assignment by
    aligning local representations,” arXiv preprint arXiv:1803.01834, 2018.
    [14] A. G. Ororbia and A. Mali, “Biologically motivated algorithms for propagating local
    target representations,” arXiv preprint arXiv:1805.11703, 2018.
    [15] S. Bartunov, A. Santoro, B. Richards, L. Marris, G. E. Hinton, and T. Lillicrap,
    “Assessing the scalability of biologically-motivated deep learning algorithms and architectures,”
    in Advances in Neural Information Processing Systems, pp. 9390–9400,
    2018.
    [16] A. Nøkland and L. H. Eidnes, “Training neural networks with local error signals.,” in
    ICML, vol. 97 of Proceedings of Machine Learning Research, pp. 4839–4850, PMLR,
    2019.
    [17] Y. Bengio, “How auto-encoders could provide credit assignment in deep networks
    via target propagation,” arXiv preprint arXiv:1407.7906, 2014.
    [18] D.-H. Lee, S. Zhang, A. Fischer, and Y. Bengio, “Difference target propagation,”
    in Joint European Conference on Machine Learning and Knowledge Discovery in
    Databases, pp. 498–515, Springer, 2015.
    [19] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine
    Learning Research, vol. 9, no. Nov, pp. 2579–2605, 2008.
    [20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
    image recognition.,” in ICLR (Y. Bengio and Y. LeCun, eds.), 2015.
    [21] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied
    to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324,
    1998.
    [22] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,”
    tech. rep., Citeseer, 2009.
    [23] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in
    Advances in Neural Information Processing Systems, pp. 3856–3866, 2017.
    [24] M. Michael and W.-C. Lin, “Experimental study of information measure and interintra
    class distance ratios on feature selection and orderings,” IEEE Transactions on
    Systems, Man, and Cybernetics, no. 2, pp. 172–181, 1973.
    [25] Y. Luo, Y. Wong, M. Kankanhalli, and Q. Zhao, “G-softmax: Improving intraclass
    compactness and interclass separability of features,” IEEE Transactions on Neural
    Networks and Learning Systems, 2019.
    [26] Y. Bengio, D.-H. Lee, J. Bornschein, T. Mesnard, and Z. Lin, “Towards biologically
    plausible deep learning,” arXiv preprint arXiv:1502.04156, 2015.
    [27] G. Taylor, R. Burmeister, Z. Xu, B. Singh, A. Patel, and T. Goldstein, “Training
    neural networks without gradients: A scalable admm approach,” in International
    Conference on Machine Learning, pp. 2722–2731, 2016.
    [28] Z. Huo, B. Gu, Q. Yang, and H. Huang, “Decoupled parallel backpropagation with
    convergence guarantee.,” in ICML, vol. 80 of Proceedings of Machine Learning Research,
    pp. 2103–2111, PMLR, 2018.
    [29] Z. Huo, B. Gu, and H. Huang, “Training neural networks using features replay,” in
    Advances in Neural Information Processing Systems, pp. 6659–6668, 2018.
    [30] H. Mostafa, V. Ramesh, and G. Cauwenberghs, “Deep supervised learning using
    local errors,” Frontiers in Neuroscience, vol. 12, p. 608, 2018.
    [31] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix
    factorization,” Nature, vol. 401, no. 6755, p. 788, 1999.
    [32] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, “Self-taught learning: transfer
    learning from unlabeled data,” in Proceedings of the 24th International Conference
    on Machine Learning, pp. 759–766, ACM, 2007.
    [33] A. Coates and A. Y. Ng, “Selecting receptive fields in deep networks,” in Advances
    in Neural Information Processing Systems, pp. 2528–2536, 2011.
    [34] P. Baldi, “Autoencoders, unsupervised learning, and deep architectures,” in Proceedings
    of ICML Workshop on Unsupervised and Transfer Learning, pp. 37–49, 2012.

    QR CODE
    :::