使用大型語言模型構建自動化問答系統｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	呂宗祐 Zong-You Lyu
論文名稱：	使用大型語言模型構建自動化問答系統 Automated Question-Answering System Using Large Language Models
指導教授：	蘇木春 Mu-Chun Su
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 軟體工程研究所 Graduate Institute of Software Engineering
論文出版年：	2025
畢業學年度：	113
語文別：	中文
論文頁數：	76
中文關鍵詞：	大型語言模型、自動資料集生成、開源模型、提示工程、問答系統、思維鏈
外文關鍵詞：	Large Language Model, Automatic Dataset Generation, Open- Source Model, Prompt Engineering, Question-Answering System, Chain of Thought
相關次數：	點閱：24 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

語言模型資料集的人工標註一直是一件耗費人力的事，但隨著最近幾年開源大語言模型不斷的更迭，有越來越多人使用大型語言模型來協助資料集的產生。因此，本研究提出了一個全開源模型的架構，並以Gemma 2-27B模型來做為主要的語言模型，目的是能夠自動化的產生語言模型的訓練資料集，以達到節省人力的目的，並提升語言模型在量化標準上的表現。

本研究將會驗證在進行微調和進行RAG的排列組合中，何種訓練方法的量化分數會最高，並會在
實驗過程中加入思考鏈會不會提升量化分數。並且本研究將會以餘弦相似度和LLM-as-judge的指標來做為評量，並會與市面上的其他資料集作為比較。

最後，會將此系統藉由ngrok技術部屬於Line bot上，以實現人機互動介面以及利用Prompt實現的簡易MCP Tool Calling，並能夠通過UI來靈活切換模型。

Manual generation of datasets for language models has long been a labor-intensive task. However, with the rapid evolution of open-source large language models in recent years, more and more researchers have begun leveraging LLMs to assist in dataset generation. Therefore, this study proposes a fully open-source architecture that leverages the Gemma 2-27B model as the core language model. The primary goal is to automate the generation of training datasets for large language models, thereby reducing human effort and improving performance on quantitative evaluation metrics.

This research will explore which training strategies across combinations of fine-tuning and retrieval-augmented generation (RAG) will yield the highest quantitative scores. It will also examine whether incorporating chain-of-thought (CoT) reasoning during generation improves the results. Evaluation will be conducted using cosine similarity and LLM-as-a-judge metrics, and results will be compared against existing public datasets.

Finally, the system will be deployed to a LINE Bot via ngrok, enabling a human-AI interactive interface and a lightweight MCP tool calling mechanism using prompt-based control. The user interface will also support dynamic model switching for flexible operation.

摘要 ................................................................................. iv
Abstract .............................................................................. v
誌謝 ................................................................................. vii
目錄 ................................................................................. viii
一、緒論 1
1 研究動機 .................................................................. 1
2 研究目的 .................................................................. 3
3 論文架構 .................................................................. 4

二、文獻回顧及背景知識 5
1 文獻回顧 .................................................................. 5
1.1 改善語言模型幻覺相關之研究 ......................... 5
1.2 歷史問答對相關之研究 .................................. 7
1.3 量化評估標準之研究 ...................................... 9
2 背景知識 .................................................................. 11
2.1 Zero-shot Prompting .......................................... 11
2.2 Few-shot Prompting ............................................ 12
2.3 Fine-tuning ....................................................... 14
2.4 LoRA（Low-Rank Adaptation） ....................... 15

三、研究方法 17
1 模型選擇 .................................................................. 17
2 系統架構 .................................................................. 19
3 文字嵌入模型 ...................................................... 20
4 大型文本來源 ...................................................... 22
5 資料前處理 .................................................................. 24
5.1 Jieba 斷詞 ............................................................ 26
5.2 TF-IDF 分類 ..................................................... 27
6 Prompt 設計 ............................................................ 28
7 系統佈署 .................................................................. 31
7.1 系統概要 ............................................................ 31
7.2 LINE Bot ............................................................ 31
7.3 ngrok ................................................................. 32
7.4 Model Context Protocol（MCP） ...................... 33
7.5 n8n Platform ..................................................... 36

四、實驗設計與結果 37
1 模型訓練參數 ...................................................... 37
2 自動評估方法設計 ................................................ 39
2.1 實驗流程 ........................................................... 39
2.2 餘弦相似度 ..................................................... 40
2.3 LLM as Judge ................................................ 42
3 有無思考鏈之比較 ................................................ 46
3.1 餘弦相似度 ..................................................... 47
3.2 G-Eval ............................................................... 48
3.3 實驗結論 ........................................................ 49
4 故障分析及範例結果 ............................................ 49
4.1 故障分析 .......................................................... 49
4.2 附錄結果 .......................................................... 49

五、總結 52
1 結論 .......................................................................... 52
2 未來展望 .................................................................. 53

參考文獻 55

附錄 A Q&A 範例 58
A.1 台灣民法 Q&A .................................................... 58
A.2 帶有思考鏈的台灣民法 Q&A ............................... 59
A.3 員工操作手冊 Q&A ............................................. 60

附錄 B 斷詞結果比較 62
B.1 Jieba 斷詞 ............................................................ 62
B.2 CKIP 斷詞 ............................................................ 63
                                

[1] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S.
Batra, P. Bhargava, S. Bhosale, et al., “Llama 2: Open foundation and fine-tuned chat
models,” arXiv preprint arXiv:2307.09288, 2023.
[2] Q. Team, Qwen2: Stronger, more open source language models, https : / / github . com /
QwenLM/Qwen2, 2024.
[3] M. Riviere, S. Pathak, P. G. Sessa, C. Hardin, S. Bhupatiraju, L. Hussenot, T. Mesnard,
B. Shahriari, A. Ramé, J. Ferret, et al., “Gemma 2: Improving open language models at
a practical size,” arXiv preprint arXiv:2408.00118, 2024.
[4] Datasaur, The definitive guide to llm automated labeling, https : / / odsc . medium . com /
datasaur-the-definitive-guide-to-llm-automated-labeling-7e8e69e69f58, 2024.
[5] T.-H. Huang, C. Cao, V. Bhargava, and F. Sala, “The alchemist: Automated labeling
500x cheaper than llm data annotators,” arXiv preprint arXiv:2407.11004, no. v1, Jun.
2024.
[6] OpenAI, “Gpt-4 technical report,” OpenAI, Tech. Rep., 2023.
[7] F. Patel. “Biomedical text publication classification. ”[Online]. Available: https://www.
kaggle.com/datasets/falgunipatel19/biomedical-text-publication-classification.
[8] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M.
Lewis, W.-t. Yih, T. Rocktäschel, et al., “Retrieval-augmented generation for knowledge-
intensive nlp tasks,” arXiv preprint arXiv:2005.11401, 2020.
[9] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirec-
tional transformers for language understanding,” in Proceedings of the 2019 Conference
of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 1 (Long and Short Papers), Association for Computa-
tional Linguistics, 2019, pp. 4171–4186.
[10] J. Wei, X. Wang, D. Schuurmans, M. Bosma, E. Chi, Q. Le, and D. Zhou, “Chain-
of-thought prompting elicits reasoning in large language models,” arXiv preprint
arXiv:2201.11903, 2022.
[11] P. Lewis, L. Denoyer, and S. Riedel, “Unsupervised question answering by cloze trans-
lation,” in Proceedings of the 57th Annual Meeting of the Association for Computational
Linguistics (ACL), 2019.
[12] 吳葦誠, “基於大型語言模型自非結構化文件建立微調資料集,”
國立臺灣大學碩士
論文, 2024.
[13] G. Biancini, A. Ferrato, and C. Limongelli, “Multiple-choice question generation using
large language models: Methodology and educator insights,” in Adjunct Proceedings of
the 32nd ACM Conference on User Modeling, Adaptation and Personalization (UMAP
Adjunct
＇24), Cagliari, Italy: ACM, 2024, pp. 584–590.
[14] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “Bertscore: Evaluating
text generation with bert,” arXiv preprint arXiv:1904.09675, 2019.
[15] Y. Liu, D. Iter, Y. Xu, S. Wang, R. Xu, and C. Zhu, “G-eval: Nlg evaluation using gpt-4
with better human alignment,” arXiv preprint arXiv:2303.16634, 2023.
[16] OpenAI, Gpt-4o system card, https://openai.com/index/gpt-4o-system-card/, 2024.
[17] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora:
Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
[18] C.-H. Chiang, Y.-S. Chuang, and H.-y. Lee, Aacl 2022 tutorial: Pretrained language
models, https://d223302.github.io/AACL2022- Pretrain- Language- Model- Tutorial/
lecture_material/AACL_2022_tutorial_PLMs.pdf, 2022.
[19] L. von Werra, Y. Belkada, L. Tunstall, E. Beeching, T. Thrush, N. Lambert, S. Huang,
K. Rasul, and Q. Gallouédec, Trl: Transformer reinforcement learning, https://github.
com/huggingface/trl, 2020.
[20] S. Mangrulkar, S. Gugger, L. Debut, Y. Belkada, S. Paul, and B. Bossan, Peft: State-of-
the-art parameter-efficient fine-tuning methods, https://github.com/huggingface/peft,
2022.
[21] Google Developers Blog. “Gemma explained: What＇s new in gemma 2. ”[Online]. Avail-
able: https://developers.googleblog.com/en/gemma-explained-new-in-gemma-2/.
[22] L. Wang, N. Yang, X. Huang, L. Yang, R. Majumder, and F. Wei, “Multilingual e5 text
embeddings: A technical report,” arXiv preprint arXiv:2402.05672, 2024.
[23] L. Contributors, Langchain: Faiss vector store integration, 2024.
[24] 中華民國法務部. “民法 - 全國法規資料庫. ”[Online]. Available: https://law.moj.gov.
tw/LawClass/LawAll.aspx?pcode=B0000001.
[25] W. Fargo, Wf employee handbook, https://www.scribd.com/document/481987918/WF-
Employee-handbook-pdf, 2020.
[26] J. Ox. “Wf employee handbook. ”[Online]. Available: https : / / www . scribd . com /
document/481987918/WF-Employee-handbook-pdf.
[27] M. Grootendorst, Keybert: Minimal keyword extraction with bert. Version v0.3.0, 2020.
[28] S. Junyi, Jieba: Chinese text segmentation, https://github.com/fxsjy/jieba, 2012.
[29] K. S. Jones, “A statistical interpretation of term specificity and its application in re-
trieval,” Journal of Documentation, vol. 28, no. 1, pp. 11–21, 1972.
[30] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,
P. Prettenhofer, R. Weiss, V. Dubourg, et al., “Scikit-learn: Machine learning in python,”
Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[31] Inconshreveable, ngrok: Secure introspectable tunnels to localhost, https://ngrok.com/
docs, 2023.
[32] LINE Corporation, LINE Messaging API Documentation, https://developers.line.biz/en/
docs/messaging-api/overview/, 2024.
[33] LINE Corporation, LINE Bot Overview, https://developers.line.biz/en/docs/, 2024.
[34] M. C. Protocol, Model context protocol: Introduction, 2024.
[35] Wikipedia contributors, Anthropic – wikipedia, the free encyclopedia, https : / / en .
wikipedia.org/wiki/Anthropic, 2024.
[36] ModelContextProtocol, Model context protocol github organization, https://github.com/
modelcontextprotocol, 2024.
[37] ModelContextProtocol, Model context protocol: Python sdk, https : / / github . com /
modelcontextprotocol/python-sdk, 2024.
[38] n8n, N8n
–workflow automation tool, https://n8n.io/, 2025.
[39] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin,
N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high-performance deep
learning library,” in Advances in Neural Information Processing Systems, vol. 32, Curran
Associates, Inc., 2019.
[40] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault,
R. Louf, M. Funtowicz, et al., “Transformers: State-of-the-art natural language process-
ing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language
Processing: System Demonstrations, Online: Association for Computational Linguistics,
Oct. 2020, pp. 38–45.
[41] X. Ming, Text2vec: A tool for text to vector, 2022.
[42] U. Lab, All-mpnet-base-v2, https://huggingface.co/sentence- transformers/all- mpnet-
base-v2, 2021.
[43] CKIPLab, ckiplab/ckip-transformers: Transformer-based Chinese NLP models by CKIP,
https://github.com/ckiplab/ckip-transformers, 2025

簡易檢索 / 詳目顯示

相關論文