藉由加入多重語音辨識結果來改善對話狀態追蹤

簡易檢索 / 詳目顯示

回結果列表

研究生：	蕭又誠 Yu-Cheng Hsiao
論文名稱：	藉由加入多重語音辨識結果來改善對話狀態追蹤 Improving Dialogue State Tracking by incorporating multiple Automatic Speech Recognition results
指導教授：	蔡宗翰 Tzong-Han Tsai
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2018
畢業學年度：	106
語文別：	中文
論文頁數：	39
中文關鍵詞：	對話系統、自動語音辨識、狀態追蹤、深度學習、強化學習
外文關鍵詞：	Dialogue system, Automatic Speech Recognition, State Tracking, Deep Learning, Reinforcement Learning
相關次數：	點閱：12 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來，對話系統的發展改變了人們與電腦交流的方式。過去人們需要透過特定指令或動作才能命令電腦進行動作，而今追求的是電腦可以從對話中理解使用者的意圖，並協助達到使用者目的。相較於純聊天的對話機器人，任務式的對話機器人以完成使用者的任務為主，也因此需要克服的問題相當多。一、系統要能透過自然語言理解來明白使用者的意圖;二、系統需要進行對話管理來決策目前對話的狀態以及下個步驟;三、系統需要產生自然語言的句子回饋給使用者。
而其中對話管理在對話系統中可以算是其中最為困難的課題，能否準確追蹤對話的狀態將會大大影響對話系統的結果。目前語音辨識結果中只有30%的錯誤率，雖然很多都是直接採用最好的語音辨識結果做為輸入來做對話狀態追蹤，但我們的目標是能夠藉由多個語音辨識結果的輸入來有效的改善對話狀態追蹤的準確率，此外還可以有效的允許錯誤的語音輸入結果。
我們將以多個語音辨識結果為輸入，透過強化學習的方式，來決定每一輪對話中需要考慮的語音辨識結果有哪些，在聚合多個結果，根據機率選擇最有可能的作為本輪對話的狀態。而我們的方法可以在測試資料集中達到59.98%的準確率，比只使用最優語音辨識結果的系統要來的好。

Nowadays, the development of dialogue systems has changed the communication between human and computer. In the past, people use commands or instructions to ask computers to do tasks. We expect the computer can understand the user intent in the dialogue, and accomplish the user goal. Unlike chit-chat bots, the purpose of task-oriented dialogue systems (TDS) is to accomplish specific tasks, like booking restaurants. So the complexity of TDS’s is more difficult than that of chi-chat bots. First, a TDS needs to understand the user intent by Language Understanding (LU). Second, a TDS requires dialog management to perform dialog state tracking (DST) and dialog policy selection. At last, the system generates the natural language sentence respond to users.
Dialogue management is most difficult in the task-oriented dialogue system structure. Our research is focused on dialog state tracking. We use the Dialog State Tracking Challenge 2(DSTC2) dataset in our experiment. According to the statistics, the Word Error Rate of automatic speech recognition (ASR) is 30%.
Most of studies only used the top ASR result as the input of their models for DST. We propose to use multiple ASR results. We use reinforcement learning to select useful rank ASR results in addition to the top-1. And use DST model to predict the dialog state of the selected ASR results. The final step is aggregating all the dialog states as our system’s output. Our method can achieve an accuracy of 59.98% in the test set, showing that our method is better than the baseline which just uses top ASR result as the input. In the future, we plan to use language understanding information of the ASR results in our method.

目錄
摘要    i
Abstract    ii
致謝    iii
目錄    iv
附圖目錄    vi
附表目錄    vii
第一章  緒論    1
1.1對話系統與對話狀態追蹤    1
1.2研究動機與目的    2
1.3論文架構    3
第二章 文獻探討    4
2.1對話狀態追蹤研究    4
2.2 強化學習相關研究    5
第三章  實驗資料分析    8
3.1 餐廳領域的Ontology    8
3.2資料集之對話管理    9
3.3 語音辨識結果分析    10
4.1 強化學習模組    13
4.1.1狀態(state)    14
4.1.2 動作(action)    15
4.1.3 獎勵(reward)    17
4.1.4 Experience replay    19
4.1.5 Deep Q network    19
4.2 對話追蹤模組    21
第五章 實驗結果與討論    23
5.1 實驗結果與討論    23
5.2 錯誤分析    24
第六章 結論與未來研究方向    25
參考文獻    26


                                

參考文獻
1. Henderson, M., B. Thomson, and J.D. Williams. The Second Dialog State Tracking Challenge. in SIGDIAL Conference. 2014.
2. Ren, H., et al. Dialog State Tracking using Conditional Random Fields. in SIGDIAL Conference. 2013.
3. Henderson, M., B. Thomson, and S. Young. Word-based dialog state tracking with recurrent neural networks. in Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). 2014.
4. Mrkšić, N., et al., Multi-domain dialog state tracking using recurrent neural networks. arXiv preprint arXiv:1506.07190, 2015.
5. Henderson, M., B. Thomson, and J.D. Williams. The third dialog state tracking challenge. in Spoken Language Technology Workshop (SLT), 2014 IEEE. 2014. IEEE.
6. Kim, S., et al., The fourth dialog state tracking challenge, in Dialogues with Social Robots. 2017, Springer. p. 435-449.
7. Kim, S., et al. The fifth dialog state tracking challenge. in Spoken Language Technology Workshop (SLT), 2016 IEEE. 2016. IEEE.
8. Watkins, C.J. and P. Dayan, Q-learning. Machine learning, 1992. 8(3-4): p. 279-292.
9. Rummery, G.A. and M. Niranjan, On-line Q-learning using connectionist systems. Vol. 37. 1994: University of Cambridge, Department of Engineering.
10. Peters, J. and S. Schaal. Policy gradient methods for robotics. in Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on. 2006. IEEE.
11. Peters, J. and S. Schaal, Natural actor-critic. Neurocomputing, 2008. 71(7): p. 1180-1190.
12. Mnih, V., et al., Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
13. Henderson, M., et al. Discriminative spoken language understanding using word confusion networks. in Spoken Language Technology Workshop (SLT), 2012 IEEE. 2012. IEEE.
14. Hochreiter, S. and J. Schmidhuber, Long short-term memory. Neural computation, 1997. 9(8): p. 1735-1780.
15. Kingma, D. and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
16. Plátek, O., et al., Recurrent Neural Networks for Dialogue State Tracking. arXiv preprint arXiv:1606.08733, 2016.
17. Schaul, T., et al., Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
18. Van Hasselt, H., A. Guez, and D. Silver. Deep Reinforcement Learning with Double Q-Learning. in AAAI. 2016.
19. Wang, Z., et al., Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581, 2015.

簡易檢索 / 詳目顯示

相關論文