開發與評估大型語言模型驅動之臺灣電影產業問答系統：應對專業領域知識散佈之挑戰

簡易檢索 / 詳目顯示

回結果列表

研究生：	郭恩淳 En-Chun Kuo
論文名稱：	開發與評估大型語言模型驅動之臺灣電影產業問答系統：應對專業領域知識散佈之挑戰 Developing and Evaluating a Large Language Model-Powered QA System for Taiwan's Film Industry: Addressing the Challenge of Dispersed Knowledge
指導教授：	陳毓鐸 Yu-To Chen 蘇雅惠 Yea-Huey Su
口試委員:
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理學系 Department of Information Management
論文出版年：	2024
畢業學年度：	113
語文別：	英文
論文頁數：	134
中文關鍵詞：	生成式AI 、自然語言處理、檢索增強生成、LangChain開源框架、臺灣電影產業、專業領域問答系統
外文關鍵詞：	Generative AI, Retrieval-Augmented Generation (RAG), Large Language Model (LLM), LangChain, Question Answering System, Taiwan Movie Industry
相關次數：	點閱：25 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

世界上各個專業領域一直存在著知識散落各處的問題，導致人們在搜尋資料過程耗時，抑或容易找到不精確資料；為了解決專業領域知識不集中等前述問題，此研究以臺灣電影產業為例深入探究，為加速在電影領域工作者或一般民眾查找臺灣電影領域資訊的速度及提高方便性，本研究開發一臺灣電影產業問答系統，運用現今已發展至成熟程度的自然語言處理（NLP）技術及檢索增強生成（RAG）技術，並透過 LangChain 開源框架實現此系統，目的是為幫助相關人士有效地提取相關資訊，同時降低使用現今大眾常使用的生成式AI問答機器人（如：ChatGPT）可能導致的資料洩漏風險。
本研究進行了四項實驗，評估此系統與通用型的生成式AI問答機器及商業付費型 RAG 工具的性能表現，結果顯示，本系統在臺灣電影產業領域問題的回答準確度上具有明顯優勢，達到60%以上的準確率。此項研究的研究重要性包括：（1）為台灣首個針對電影產業並使用繁體中文的專屬問答系統雛形，（2）使電影產業專業人士能夠透過簡單的查詢快速獲取可靠答案，從而提升決策效率，並降低資料洩漏風險，（3）讓想涉略各專業領域的人們可以參考此方式，藉以降低搜索資料的困難。本研究同時也討論專業領域問答系統的重要性及結合不斷推陳出新的生成式AI技術，希冀未來能夠應用到其他專業領域或是結合更多型態的資料來源，藉以豐富並強化此臺灣電影領域問答系統。

To address the challenges posed by dispersed domain knowledge in the Taiwanese movie industry, this research introduces a specialized Question Answering (QA) system. The system integrates advancements in Natural Language Processing (NLP) and Retrieval-Augmented Generation (RAG) technology through open-source platforms LangChain. It is designed to help industry professionals efficiently extract relevant information while minimizing the risk of data leakage associated with general-purpose chatbots.
This study conducted four experiments to evaluate the system's performance against widely used AI chatbots and a commercial RAG tool. The results showed our model's superior accuracy, achieving over 60% in domain-specific queries, surpassing both generative AI chatbots and the commercial RAG tool. Research significance include: (1) developing Taiwan's first specialized question-answering system for the film industry using Traditional Chinese, (2) enabling film professionals to quickly access reliable information, enhancing decision-making and reducing data leakage risks, and (3) offering a reference for individuals in various fields to ease information searches. The study also highlights the value of domain-specific QA systems and their integration with evolving generative AI technologies. Future applications may extend to other fields or include more diverse data sources, further strengthening this system for the Taiwanese film industry.

CHINESE ABSTRACT   i
ENGLISH ABSTRACT  ii
ACKNOWLEDGEMENT  iii
TABLE OF CONTENTS  iv
FIGURES  vi
TABLES viii
INTRODUCTION  1
1. Research Background and Motivation  1
2. Research Question  7
3. Research Purpose  8
4. Research Importance  8
5. Following Structure  9
RELATED WORKS  11
1. The Movie Industry in Taiwan  11
2. The Evolutionary QA System 13
3. The Emergence of LLMs and Generative AI  16
4. RAG Models and Domain-Specific QA systems 19
5. Challenges in Implementing Generative AI  24
6. Assessing the Correctness of QA Systems  28
FRAMEWORK AND IMPLEMENTATION OF TWMovQA SYSTEM  31
1. Framework of the TWMovQA System  31
2. Creating Movie QA System Based on Taiwanese Film Industry  39
3. Foundation of TWMovQA System Movie-related Question-Answer Sets. 40
EXPERIMENTAL DESIGN AND PERFORMANCE METRIC OF TWMovQA SYSTEM  44
v
1. Experiments Structure  44
2. Detailed Experiment Descriptions  49
3. Evaluation Process and Performance Metric  55
EXPERIMENTAL RESULTS AND DISCUSSION  62
1. Experiment I: Evaluation of Knowledge Extraction from Diverse Articles 62
2. Experiment II: Assessing QA System Performance with Increasing Text Inputs 84
3. Experiment III: Assessing Knowledge Extraction Across Texts  89
4. Experiment IV: Evaluating Hallucination in Movie QA System  93
5. Discussion of TWMovQA System Implementation . 96
6. Discussion of Evaluation Methods and Experiments Results  98
CONCLUSIONS . 107
1. Research Results: The TWMovQA System  107
2. Managerial Implications  108
3. Current Limitation 109
4. Future Work  111
REFERENCES  113
APPENDIX  118
I. The Local Documents Utilized in the Experiments 118
II. The Sample of Question Set   118
III. Detailed Experiment Result for Experiment I: Phase Two 119
IV. The Unrelated Question Set of Experiment IV   122
                                

[1] B. F. Green, A. K. Wolf, C. Chomsky, and K. Laughery, “Baseball: an automatic question-answerer,” in Papers Presented at the May 9-11, 1961, Western Joint IRE-AIEE-ACM Computer Conference, 1961, pp. 219–224.
[2] W. A. Woods, "Progress in natural language understanding: an application to lunar geology." in Proceedings of the June 4-8, 1973, national computer conference and exposition, 1973, pp. 441-450.
[3] B. Katz, G. C. Borchardt, and S. Felshin, "Natural language annotations for question answering," in Proceedings of the 19th International Florida Artificial Intelligence Research Society Conference (FLAIRS), vol. 6, May 2006, pp. 303–306.
[4] B. Katz, J. Lin, and S. Felshin. "Annotating the world wide web." in MIT Artificial Intelligence Laboratory Research Abstracts (this volume), Sep. 2001.
[5] D. Ferrucci, E. Brown, J. ChuCarroll, J. Fan, D. Gondek, A. A. Kalyanpur, et al., "Building Watson: an overview of the DeepQA project," AI Magazine, vol. 31, no. 3, pp. 59–79, Sep. 2010.
[6] T. Joachims, “Text categorization with Support Vector Machines: learning with many relevant features,” in Machine Learning: ECML–98, vol. 1398, 1998, pp.137–142.
[7] U. Shaham, T. Zahavy, C. Caraballo, S. Mahajan, D. Massey, and H. Krumholz, "Learning to ask medical questions using reinforcement learning," in Proceedings of the Machine Learning for Healthcare Conference, Sep. 2020, pp. 2–26.
[8] S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint, Jun. 15, 2017, arXiv:1706.05098. doi: 10.48550/arXiv.1706.05098.
[9] I. Sutskever,“Sequence to sequence learning with neural networks,” arXiv preprint, Dec. 14, 2014, arXiv:1409.3215. doi: 10.48550/arXiv.1409.3215.
[10] A. Vaswani, "Attention is all you need," in Advances in Neural Information Processing Systems (NeurIPS), 2017.
[11] M. Tapaswi, Y. Zhu, R. Stiefelhagen, A. Torralba, R. Urtasun, and S. Fidler, "MovieQA: understanding stories in movies through question-answering," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4631–4640.
[12] O. Tafjord and P. Clark, "General-purpose question-answering with Macaw,"arXiv preprint, arXiv:2109.02593, 2021.
[13] Devlin, M.W. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proceedings of NAACL-HLT, vol. 1, Jun 2019, pp. 1–2.
[14] T. B. Brown, "Language models are few-shot learners," arXiv preprint,” arXiv:2005.14165, 2020.
[15] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pre-training," [Online]. Available: https://www.openai.com/research/. [accessed: Mar.2024].
[16] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, et al., "GPT-4 technical report," arXiv preprint, arXiv:2303.08774, 2023.
[17] Gemini Team, R. Anil, S. Borgeaud, J.B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, et al., "Gemini: a family of highly capable multimodal models," arXiv preprint, arXiv:2312.11805, 2023.
[18] Anthropic, "The Claude 3 model family: Opus, Sonnet, Haiku," [Online]. Available: https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf. [accessed: Jul. 2024].
[19] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, et al., "Llama 2: open foundation and fine-tuned chat models," arXiv preprint, arXiv:2307.09288, 2023.
[20] T. Le Scao, A. Fan, C. Akiki, E. Pavlick, S. Ilić, D. Hesslow, R. Castagné, et al., "BLOOM: A 176B-parameter open-access multilingual language model,"arXiv preprint, arXiv: 2211.05100, 2023.
[21] H. Wang, L. Wang, Y. Du, L. Chen, J. Zhou, Y. Wang, and K. F. Wong, "A survey of the evolution of language model-based dialogue systems," arXiv preprint, arXiv:2311.16789, 2023.
[22] S. Yella, “AI-Driven content creation and personalization: revolutionizing digital marketing strategies,” Int. Res. J. Eng. Technol. (IRJET), vol. 11, Jul. 2024.
[23] O. Pavlova and A. Kuzmin, "Analysis of artificial intelligence-based systems for automated generation of digital content," 2024. [Online]. Available: https://elar.khmnu.edu.ua/handle/123456789/15916. [accessed: Jul. 2024].
[24] N. Bian, H. Lin, P. Liu, Y. Lu, C. Zhang, B. He, X. Han, and L. Sun, "Influence of external information on large language models mirrors social cognitive patterns," IEEE Trans. Comput. Social Syst.,2024.
[25] V. Adlakha, P. BehnamGhader, X. H. Lu, N. Meade, and S. Reddy, "Evaluating correctness and faithfulness of instruction-following models for question answering," arXiv preprint, arXiv:2307.16877, 2023.
[26] K. Sparck Jones, "A statistical interpretation of term specificity and its application in retrieval," J. Documentation, vol. 28, no. 1, pp. 11–21, 1972.
[27] S. Robertson and H. Zaragoza, “The probabilistic relevance framework: BM25 and beyond,” Foundations and Trends® in Information Retrieval, vol. 3, no. 4, 2009, pp. 333–389.
[28] V. Karpukhin, B. Oğuz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.T. Yih, "Dense passage retrieval for open-domain question answering", in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), vol.1, 2020. pp. 6769-6781.
[29] M. Xia, X. Zhang, C. Couturier, G. Zheng, S. Rajmohan, and V. Ruhle, "Hybrid retrieval-augmented generation for real-time composition assistance," arXiv preprint, arXiv:2308.04215, 2023.
[30] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,” in Adv. Neural Inf. Process. Syst.,vol. 33, pp. 9459–9474, 2020.
[31] S. Jeong, J. Baek, S. Cho, S. J. Hwang, and J. C. Park, “Adaptive-RAG: learning to adapt retrieval-augmented large language models through question complexity,” arXiv preprint, arXiv:2403.14403, 2024.
[32] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, et al., "A survey of large language models," arXiv preprint, arXiv:2303.18223, 2023.
[33] P. Zhao, H. Zhang, Q. Yu, Z. Wang, Y. Geng, F. Fu, L. Yang, W. Zhang, et al., "Retrieval-augmented generation for AI-generated content: a survey," arXiv preprint, arXiv:2402.19473, 2024.
[34] Y. Mao, P. He, X. Liu, Y. Shen, J. Gao, J. Han, and W. Chen, "Generation-augmented retrieval for open-domain question answering," arXiv preprint, arXiv:2009.08553, 2020.
[35] S. Wu, O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann, P. Kambadur, D. Rosenberg, and G. Mann, "BloombergGPT: a large language model for finance," arXiv preprint, arXiv:2303.17564, 2023.
[36] R. Liu, C. Zenke, C. Liu, A. Holmes, P. Thornton, and D. J. Malan, "Teaching CS50 with AI: leveraging generative artificial intelligence in computer science education," in Proceedings of the 55th ACM Technical Symposium on Computer Science Education. vol. 1, pp. 750–756, 2024.
[37] Z. Xu, M. J. Cruz, M. Guevara, T. Wang, M. Deshpande, X. Wang, and Z. Li, “Retrieval-augmented generation with knowledge graphs for customer service question answering,” in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024, pp. 2905–2909.
[38] K. G. Yager, "Domain-specific chatbots for science using embeddings," Digital Discovery, vol. 2, no. 6, pp. 1850–1861, 2023.
[39] H. Zhang, Y. Liu, L. Dong, Y. Huang, Z. H. Ling, Y. Wang, L. Wang, et al., "MoVQA: a benchmark of versatile question-answering for long-form movie understanding," arXiv preprint, arXiv:2312.04817, 2023.
[40] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI Blog, vol. 1, no. 8, p. 9, 2019.
[41] Y. Zhang, Y. Li, L. Cui, D. Cai, L. Liu, T. Fu, X. Huang, et al., "Siren's song in the AI ocean: a survey on hallucination in large language models," arXiv preprint, arXiv:2309.01219, 2023.
[42] F. Liu, K. Lin, L. Li, J. Wang, Y. Yacoob, and L. Wang, "Mitigating hallucination in large multi-modal models via robust instruction tuning," in The Twelfth International Conference on Learning Representations, 2023.
[43] F. Shi, X. Chen, K. Misra, N. Scales, D. Dohan, E. H. Chi, N. Schärli, and D. Zhou, "Large language models can be easily distracted by irrelevant context," in International Conference on Machine Learning, pp. 31210–31227, 2023.
[44] H. Ye, T. Liu, A. Zhang, W. Hua, and W. Jia, "Cognitive mirage: A review of hallucinations in large language models," arXiv preprint, arXiv:2309.06794, 2023.
[45] J. Welser, J. W. Pitera, and C. Goldberg, "Future computing hardware for AI," in 2018 IEEE International Electron Devices Meeting (IEDM), pp. 1–3, 2018.
[46] J. Ye, X. Chen, N. Xu, C. Zu, Z. Shao, S. Liu, Y. Cui, et al., "A comprehensive capability analysis of GPT-3 and GPT-3.5 series models," arXiv preprint, arXiv:2303.10420, 2023. doi: 10.48550/arXiv.2303.10420.
[47] Menghani, "Efficient deep learning: a survey on making deep learning models smaller, faster, and better," ACM Comput. Surv., vol. 55, no. 12, pp. 1–37, 2023.
[48] S. G. Ayyamperumal and L. Ge, "Current state of LLM risks and AI guardrails," arXiv preprint, arXiv:2406.12934, 2024.
[49] A. Venkatesh, C. Khatri, A. Ram, F. Guo, R. Gabriel, A. Nagar, R. Prasad, et al., "On evaluating and comparing open domain dialog systems," arXiv preprint, arXiv.1801.03625, 2018.
[50] J. Casas, M. O. Tricot, O. Abou Khaled, E. Mugellini, and P. Cudré Mauroux, "Trends & methods in chatbot evaluation," in Companion Publication of the 2020 International Conference on Multimodal Interaction, pp. 280–286, 2020.
[51] R. S. Goodman, J. R. Patrinely, C. A. Stone, E. Zimmerman, R. R. Donald, S. S. Chang, S. T. Berkowitz, et al., "Accuracy and reliability of chatbot responses to physician questions," JAMA Network Open, vol. 6, no. 10, pp. e2336483–e2336483, 2023.
[52] N. Dziri, X. Lu, M. Sclar, X. L. Li, L. Jiang, B. Y. Lin, S. Welleck, et al., "Faith and fate: limits of transformers on compositionality," Adv. Neural Inf. Process. Syst., vol. 36, 2024.
[53] T. Cao, N. Raman, D. Dervovic, and C. Tan, "Characterizing multimodal long-form summarization: a case study on financial reports," arXiv preprint, arXiv:2404.06162, 2024
[54] E. C. Kuo, Y. T. Chen, and Y. H. Su," Assembling fragmented domain knowledge: a LLM-powered QA system for Taiwan cinema," in 2024 IEEE Congress on Evolutionary Computation (CEC), Jun. 2024, pp. 1–8

簡易檢索 / 詳目顯示

相關論文