應用強化式學習於多面向對話回應模組之研究｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳臆玄 Yi-Hsuan Chen
論文名稱：	應用強化式學習於多面向對話回應模組之研究 Application of Reinforcement Learning in Multi-Faceted Story Chatbot Response Action Selection
指導教授：	張嘉惠 Chia-Hui Chang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	40
中文關鍵詞：	教育型聊天機器人、強化式學習
外文關鍵詞：	Educational chatbot, Reinforcement learning
相關次數：	點閱：11 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

我們希望透過英語閱讀的方式使學生對英文產生興趣，讓學生透過閱讀將英文與自我生活的產生連結，在社會脈絡中發展語言的能力。然而這樣的社會文化建構過程需要大量的師資，在目前有限的人力資源下並不可行。
因此我們對話系統以聊故事為主軸與使用者建立共同話題並展開對話，我們團隊希望學生與聊天機器人的互動，不只是單純得進行故事的討論，也可以進行日常對話的問答或是讓學生適當地擁有主導的話語權，然而這目前仍然是一個挑戰，因為在多面向模組的整合下，機器人更需要有充足的自然語言理解以及對話策略選擇的能力，可以自動且有效率的提供符合當下情境的回應。
此文的主要任務就是要介紹我們如何訓練一個教育對話機器人模型，讓他可以從多種狀態下去察覺學生的情況，再探勘此狀態組合的對應的回覆，在此模型中我們採用了強化式學習(Reinforcement learning)的訓練架構進行訓練，以此達到此論文最終目的---與使用者建立關係並使對話長久進行。

We hope that through reading in English, students will be interested in English, so that students can connect English with their own life through reading, and develop their language ability in the social context. However, such a social and cultural construction process requires a large number of teachers, which is not feasible under the current limited human resources.
Therefore, our dialogue system takes the story as the main axis to establish a common topic and start a dialogue with users. Our team hopes that the interaction between students and chatbots is not only a simple discussion of stories, but also a question-and-answer session in daily conversations or allowing students to appropriately However, this is still a challenge, because under the integration of multi-faceted modules, robots need to have sufficient natural language understanding and dialogue strategy selection capabilities, which can automatically and efficiently provide products that meet the needs of the current situation. situational response.
The main task of this article is to introduce how we train an educational dialogue robot model, so that it can detect the situation of students from various states, and then explore the corresponding replies of this combination of states. In this model, we use Reinforcement learning training architecture to achieve the ultimate goal of this paper - to establish a relationship with the user and make the dialogue perpetual.

中文摘要............................................................................................................... i
英文摘要............................................................................................................... ii
目錄 ...................................................................................................................... iii
圖目錄 .................................................................................................................. v
表目錄 .................................................................................................................. vi
一、 緒論 ................................................................................................ 1
1.1 問題描述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 研究目標 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
二、 相關研究......................................................................................... 3
2.1 對話系統 (Dialogue Systems) . . . . . . . . . . . . . . . . . . . . 3
2.2 對話管理(Dialogue Manager) . . . . . . . . . . . . . . . . . . . . 3
2.3 教育類型的對話機器人 . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 深度強化學習(Deep Reinforcement Learning) . . . . . . . . . . . 5
2.5 強化學習結合聊天機器人 . . . . . . . . . . . . . . . . . . . . . . 6
三、 方法 ................................................................................................ 8
3.1 任務定義 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 狀態集的特徵擷取 . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 歷史對話的特徵擷取 . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 方法與模型 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.1 訓練方法與演算法 . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 基於規則式的回應模組 . . . . . . . . . . . . . . . . . . . . . . . 12
四、 資料準備與資料集 .......................................................................... 16
4.1 資料來源 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 標記過程 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 對話標記資料 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3.1 資料統計與分析 . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3.2 對話回應的選擇與相似度分析 . . . . . . . . . . . . . . . . . . . 17
4.3.3 對話回應評分統計 . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3.4 對話狀態統計 . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
五、 實驗 ................................................................................................ 22
5.1 實驗分析 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 模型學習曲線效能 . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 強化式學習方法比較 . . . . . . . . . . . . . . . . . . . . . . . . 22
5.4 強化式學習方法與規則式比較 . . . . . . . . . . . . . . . . . . . 25
5.5 小結 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
六、 結論與未來展望.............................................................................. 27
參考文獻...............................................................................................................28
                                

[1] Stefano Mezza, Alessandra Cervone, Giuliano Tortoreto, Evgeny A Stepanov,
and Giuseppe Riccardi. Iso-standard domain-independent dialogue act tagging
for conversational agents. arXiv preprint arXiv:1806.04327, 2018.
[2] Marina Umaschi Bers and Justine Cassell. Interactive storytelling systems for
children: Using technology to explore language and identity. Journal of Interactive Learning Research, 9:183–215, 1998.
[3] Neville Bennett. The emanuel miller memorial lecture 1990 cooperative learning in classrooms: Processes and outcomes. Journal of Child Psychology and
Psychiatry, 32(4):581–594, 1991.
[4] Hatice Çirali Sarica and Yasemin Koçak Usluel. The effect of digital storytelling
on visual memory and writing skills. Comput. Educ., 94:298–309, 2016.
[5] Tecnam Yoon. DEVELOPING MULTIMODAL DIGITAL LITERACY: THE
APPLICATION OF DIGITAL STORYTELLING AS A NEW AVENUE FOR
EFFECTIVE ENGLISH LEARNING WITH EFL ELEMENTARY SCHOOL
STUDENTS IN KOREA. PhD thesis, University of Massachusetts Amherst,
Amherst, MA, 5 2014. An optional note.
[6] Nicoletta Di Blas, Franca Garzotto, Paolo Paolini, and Amalia G. Sabiescu.
Digital storytelling as a whole-class learning activity: Lessons from a threeyears project. In ICIDS, 2009.
[7] Pelin Yuksel, Bernard R. Robin, and Sara G. McNeil. Educational uses of digital
storytelling all around the world. In Proceedings of Society for Information
Technology & Teacher Education International Conference 2006, 2011.
[8] Banny S. K. Chan, Daniel Churchill, and Thomas K. F. Chiu. Digital literacy
learning in higher education through digital storytelling approach. Journal of
International Education Research, 13:1–16, 2017.
[9] Heather Lotherington and Jennifer Jenson. Teaching multimodal and digital
literacy in l2 settings: New literacies, new basics, new pedagogies. Annual
Review of Applied Linguistics, 31:226 – 246, 2011.
[10] Ya-Ting Carolyn Yang and Wan-Chi Wu. Digital storytelling for enhancing
student academic achievement, critical thinking, and learning motivation: A
year-long experimental study. Comput. Educ., 59:339–352, 2012.
[11] Chen-Chung Liu, Pin ching Wang, and Shu-Ju Diana Tai. An analysis of student
engagement patterns in language learning facilitated by web 2.0 technologies.
ReCALL, 28:104 – 122, 2016.
[12] Crystal Shelby-Caffey, Edwin Ubeda, and Beth Jenkins. Digital storytelling
revisited: An educator’s use of an innovative literacy practice. The Reading
Teacher, 68:191–199, 2014.
[13] Arthur C Graesser, Patrick Chipman, Brian C Haynes, and Andrew Olney.
Autotutor: An intelligent tutoring system with mixed-initiative dialogue. IEEE
Transactions on Education, 48(4):612–618, 2005.
[14] Fumihide Tanaka and Shizuko Matsuzoe. Children teach a care-receiving robot
to promote their learning: Field experiments in a classroom for vocabulary
learning. Journal of Human-Robot Interaction, 1(1):78–95, 2012.
[15] James P Baker, Cathlin V Clark-Gordon, and Scott A Myers. Using emotional
response theory to examine dramatic teaching behaviors and student approach–
avoidance behaviors. Communication Education, 68(2):193–214, 2019.
[16] Joseph E Michaelis and Bilge Mutlu. Supporting interest in science learning
with a social robot. In Proceedings of the 18th ACM International Conference
on Interaction Design and Children, pages 71–82, 2019.
[17] Martin Saerbeck, Tom Schut, Christoph Bartneck, and Maddy D Janse. Expressive robots in education: varying the degree of social supportive behavior
of a robotic tutor. In Proceedings of the SIGCHI conference on human factors
in computing systems, pages 1613–1622, 2010.
[18] Ying Xu, Dakuo Wang, Penelope Collins, Hyelim Lee, and Mark Warschauer.
Same benefits, different communication patterns: Comparing children’s reading
with a conversational agent vs. a human partner. Computers & Education,
161:104059, 2021.
[19] Richard S Sutton, Andrew G Barto, et al. Introduction to reinforcement learning. 1998.
[20] Peter Stone, Richard S Sutton, and Gregory Kuhlmann. Reinforcement learning
for robocup soccer keepaway. Adaptive Behavior, 13(3):165–188, 2005.
[21] Arkady Epshteyn, Adam Vogel, and Gerald DeJong. Active reinforcement learning. In Proceedings of the 25th international conference on Machine learning,
pages 296–303, 2008.
[22] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning,
8(3):279–292, 1992.
[23] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis
Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep
reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
[24] Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning
with double q-learning. In Proceedings of the AAAI conference on artificial
intelligence, volume 30, 2016.
[25] Csaba Szepesvári. Algorithms for reinforcement learning. Synthesis lectures on
artificial intelligence and machine learning, 4(1):1–103, 2010.
[26] Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, and Asli Celikyilmaz. End-to-end task-completion neural dialogue systems. arXiv preprint
arXiv:1703.01008, 2017.
[27] Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, Kam-Fai Wong, and ShangYu Su. Deep dyna-q: Integrating planning for task-completion dialogue policy
learning. arXiv preprint arXiv:1801.06176, 2018.
[28] Richard S Sutton. Integrated architectures for learning, planning, and reacting
based on approximating dynamic programming. In Machine learning proceedings 1990, pages 216–224. Elsevier, 1990.
[29] Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. Conservative
q-learning for offline reinforcement learning. Advances in Neural Information
Processing Systems, 33:1179–1191, 2020.
[30] Iulian V Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang,
Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath
Chandar, Nan Rosemary Ke, et al. A deep reinforcement learning chatbot.
arXiv preprint arXiv:1709.02349, 2017.

簡易檢索 / 詳目顯示

相關論文