none｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	施庫瑪 Sipun Kumar Pradhan
論文名稱：	A Rapid Deep Learning Model for Goal-Oriented Dialog
指導教授：	陳慶瀚 Ching-Han Chen
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2016
畢業學年度：	104
語文別：	英文
論文頁數：	91
中文關鍵詞：	問題問答、記憶神經網路、長期記憶元件
外文關鍵詞：	Question Answering, Memory neural networks, Long-term memory component
相關次數：	點閱：8 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

摘要

　　開放領域問答系統（QA）系統，旨在沒有領域限制的範圍下，提供由自然語言提出問題中的確切答案。本研究的目標在於開發一個學習模式，可以自動產生新的特徵，而無需重新訓練，本系統特別在於解決結構和意義的多重開域QA任務。該框架的主要優點是，它只需要的少量的特徵選取工程以及領域，並能達到同時匹配或超過目前的先進成果。此外，它可以很容易地被訓練成符合任何一種開放域QA的使用。
　　我們研究了一類新的學習模式稱為記憶神經網絡。記憶神經網絡的原因與推理具有長期記憶組件的結合，多個記憶神經網絡能學會共同使用彼此資源。長期存儲器可被讀取和寫入，我們使用它進行預測的目標，本論文應用在問題的應答（QA），其中長期記憶成為整個動態資料庫的基礎，並且以文字的形式輸出結果。我們提出基於端至端神經網絡的系統，可以達到我們所希望的目標，並學習執行特殊的操作。在本論文的最後，本系統將與多種不同的基準數據集做比較，並討論未來的工作。

Open-domain Question Answering (QA) systems aim at providing the exact answer(s) to questions formulated in natural language, without restriction of domain. My research goal in this thesis is to develop learning models that can automatically induce new facts without having to be re-trained, in particular its structure and meaning in order to solve multiple Open-domain QA tasks. The main advantage of this framework is that it requires little feature engineering and domain specificity whilst matching or surpassing state-of-the-art results. Furthermore, it can easily be trained to be used with any kind of Open-domain QA.

I investigate a new class of learning models called memory neural networks. Memory neural networks reason with inference components combined with a long-term memory component; they learn how to use these jointly. The long-term memory can be read and written to, with the goal of using it for prediction. I investigate these models in the context of question answering (QA) where the long-term memory effectively acts as a (dynamic) knowledge base, and the output is a textual response. Finally, I show that an end-to-end dialog system based on memory neural networks can reach promising and learn to perform non-trivial operations. I confirm those results by comparing my system to various well-crafted baseline Datasets and future work is discussed.

Contents
Chapter 1    Introduction    1
1    Overview    1
Out-Of-order access:    1
Long-term dependency:    2
Unordered set:    3
2    Motivation    3
3    Brief Literature    4
4    Contributions    6
5    Thesis Organization    7
Chapter 2    Deep Learning Background    9
1    Deep Learning and Artificial Intelligence    9
2    Why Deep Learning?    9
2.1    Learning Representations    9
2.2    Distributed Representations    9
2.3    Learning Multiple Levels of Inference    10
3    Neural Networks: Definitions and Basics    11
3.1    Word Vector Representations    14
4    Recurrent Neural Network    15
4.1    Adaptive Context Features.    18
4.2    Forward Pass    19
4.3    Backward Pass    20
5    Memory Networks    21
5.1    Long Short Term Memory    21
5.2    The LSTM Architecture    23
5.3    Influence of Preprocessing    25
5.4    Gradient Calculation    26
5.5    Architectural Enhancements    27
5.6    LSTM Equations    27
6    Hashing Function    30
Chapter 3    Memory Neural Network    32
1    Memory Network Implementation    36
1.1    Memory Neural Network Model    36
1.2    Training a Memory Neural Network    39
1.3    Word Sequences as Input    40
1.4    Efficient Memory Via Hashing    41
1.5    Modelling Write Time    42
1.6    Modelling Previously Unseen Words    44
1.7    Exact Matches And Unseen Words    45
Chapter 4    Implementation    46
1    Single Layer    47
1.1    Memory Representation:    47
1.2    Generating the Final Prediction:    48
2    Multiple Layers    49
3    Synthetic Question and Answering Experiments    52
3.1    Sentence Representation:    53
3.2    Temporal Encoding:    54
3.3    Learning Time Invariance by Injecting Random Noise:    54
Chapter 5    Experimental result    55
1    Dataset    55
2    Preprocessing.    62
3    Baselines    63
4    Results    64
5    QA With Previously Unseen Words    69
6    Combining Simulated Data and Large-Scale QA    69
7    Language Modeling Experiments    69
Chapter 6    Conclusion    73
1    Main Contributions    73
2    Future work    74
REFERENCE………………………………………………………………..75


                                

References

[1] M. Richardson, C. J. Burges, and E. Renshaw, "MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text," in EMNLP, 2013, p. 4.
[2] J. Berant, A. Chou, R. Frostig, and P. Liang, "Semantic Parsing on Freebase from Question-Answer Pairs," in EMNLP, 2013, p. 6.
[3] A. Fader, L. Zettlemoyer, and O. Etzioni, "Open question answering over curated and extracted knowledge bases," in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014, pp. 1156-1165.
[4] B. Bakker, "Reinforcement Learning with Long Short-Term Memory," in NIPS, 2001, pp. 1475-1482.
[5] X. Yao, J. Berant, and B. Van Durme, "Freebase QA: Information Extraction or Semantic Parsing?," ACL 2014, p. 82, 2014.
[6] J. Berant, V. Srikumar, P.-C. Chen, A. Vander Linden, B. Harding, B. Huang, et al., "Modeling Biological Processes for Reading Comprehension," in EMNLP, 2014.
[7] H. J. Levesque, E. Davis, and L. Morgenstern, "The Winograd schema challenge," in AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning, 2011, p. 47.
[8] P. Liang, "Lambda dependency-based compositional semantics," arXiv preprint arXiv:1309.4408, 2013.
[9] P. Liang, M. I. Jordan, and D. Klein, "Learning dependency-based compositional semantics," Computational Linguistics, vol. 39, pp. 389-446, 2013.
[10] C. D. Manning and H. Schütze, Foundations of statistical natural language processing vol. 999: MIT Press, 1999.
[11] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, "Natural language processing (almost) from scratch," Journal of Machine Learning Research, vol. 12, pp. 2493-2537, 2011.
[12] E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng, "Improving word representations via global context and multiple word prototypes," in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, 2012, pp. 873-882.
[13] T. Mikolov, W.-t. Yih, and G. Zweig, "Linguistic Regularities in Continuous Space Word Representations," in HLT-NAACL, 2013, pp. 746-751.
[14] T. Luong, R. Socher, and C. D. Manning, "Better Word Representations with Recursive Neural Networks for Morphology," in CoNLL, 2013, pp. 104-113.
[15] J. Pennington, R. Socher, and C. D. Manning, "Glove: Global Vectors for Word Representation," in EMNLP, 2014, pp. 1532-43.
[16] T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur, "Recurrent neural network based language model," in Interspeech, 2010, p. 3.
[17] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, pp. 1735-1780, 1997.
[18] A. Graves, "Supervised Sequence Labelling with Recurrent Neural Networks," Ph.D Thesis, Technical University of Munich, 2008.
[19] W. Zaremba and I. Sutskever, "Learning to execute," arXiv preprint arXiv:1410.4615, 2014.
[20] R. Socher, "Recursive Deep Learning for Natural Language Processing and Computer Vision," Citeseer, 2014.
[21] J. Goodman, "Classes for fast maximum entropy training," in Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP'01). 2001 IEEE International Conference on, 2001, pp. 561-564.
[22] F. A. Gers and E. Schmidhuber, "LSTM recurrent networks learn simple context-free and context-sensitive languages," IEEE Transactions on Neural Networks, vol. 12, pp. 1333-1340, 2001.
[23] F. A. Gers, N. N. Schraudolph, and J. Schmidhuber, "Learning precise timing with LSTM recurrent networks," Journal of machine learning research, vol. 3, pp. 115-143, 2002.
[24] S. Hochreiter, M. Heusel, and K. Obermayer, "Fast model-based protein homology detection without alignment," Bioinformatics, vol. 23, pp. 1728-1736, 2007.
[25] J. Chen and N. S. Chaudhari, "Protein secondary structure prediction with bidirectional lstm networks," in International Joint Conference on Neural Networks: Post-Conference Workshop on Computational Intelligence Approaches for the Analysis of Bio-data (CI-BIO)(August 2005), 2005.
[26] D. Eck and J. Schmidhuber, "Finding temporal structure in music: Blues improvisation with LSTM recurrent networks," in Neural Networks for Signal Processing, 2002. Proceedings of the 2002 12th IEEE Workshop on, 2002, pp. 747-756.
[27] A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures," Neural Networks, vol. 18, pp. 602-610, 2005.
[28] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks," in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369-376.
[29] M. Liwicki, A. Graves, H. Bunke, and J. Schmidhuber, "A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks," in Proc. 9th Int. Conf. on Document Analysis and Recognition, 2007, pp. 367-371.
[30] A. Graves, M. Liwicki, H. Bunke, J. Schmidhuber, and S. Fernández, "Unconstrained on-line handwriting recognition with recurrent neural networks," in Advances in Neural Information Processing Systems, 2008, pp. 577-584.
[31] F. A. Gers, J. Schmidhuber, and F. Cummins, "Learning to forget: Continual prediction with LSTM," Neural computation, vol. 12, pp. 2451-2471, 2000.
[32] A. Graves, S. Fernández, and J. Schmidhuber, "Bidirectional LSTM networks for improved phoneme classification and recognition," in International Conference on Artificial Neural Networks, 2005, pp. 799-804.
[33] S. Sukhbaatar, J. Weston, and R. Fergus, "End-to-end memory networks," in Advances in neural information processing systems, 2015, pp. 2440-2448.
[34] J. Weston, S. Bengio, and N. Usunier, "Wsabie: Scaling up to large vocabulary image annotation," 2011.
[35] John Tolkien and R. Reuel, "The Fellowship of the Ring. George Allen & Unwin," 1954.
[36] J. Weston, S. Chopra, and A. Bordes, "Memory networks," arXiv preprint arXiv:1410.3916, 2014.
[37] J. Weston, A. Bordes, S. Chopra, A. M. Rush, B. van Merriënboer, A. Joulin, et al., "Towards ai-complete question answering: A set of prerequisite toy tasks," arXiv preprint arXiv:1502.05698, 2015.
[38] M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, "Building a large annotated corpus of English: The Penn Treebank," Computational linguistics, vol. 19, pp. 313-330, 1993.
[39] T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, and M. A. Ranzato, "Learning longer memory in recurrent neural networks," arXiv preprint arXiv:1412.7753, 2014.

簡易檢索 / 詳目顯示

相關論文