| 研究生: |
賴宜慧 Yi-Hui Lai |
|---|---|
| 論文名稱: |
運用LLM 與RAG 技術進行設計審查:架構重構建議之品質與效益分析 |
| 指導教授: | 鄭永斌 |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 47 |
| 中文關鍵詞: | 設計審查 、軟體架構評估 、大型語言模型 、程式碼重構 |
| 外文關鍵詞: | Design Review, Software Architecture Evaluation, Large Language Models (LLMs), Code Refactoring |
| 相關次數: | 點閱:29 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
軟體設計審查(Design Review)是確保架構品質與系統可維護性的關鍵環節,然實務上多依賴於開發者經驗與主觀判斷,缺乏一致性與標準化。隨著大型語言模型(LLMs)於程式開發輔助中的應用興盛,已有研究嘗試以其協助程式品質分析,惟多聚焦於函式層級的程式碼異味(code smell),對架構層級設計審查,特別是在需求擴充情境下的適應性分析與優化建議生成,仍所涉不多。
本研究即以變更導向之架構層級設計審查為應用場景,評估 LLM 結合檢索增強生成技術(RAG),在審查既有系統架構應對新需求的能力時,是否能產出具品質的重構建議。實驗設計三種不同推理能力的語言模型與三種推理流程架構進行交叉測試。結果顯示, RAG 技術則能顯著提升低推理能力模型的生成品質,對中等推理能力模型亦有助於補足知識落差,促使其產出更具延伸性與結構性的重構建議。
本研究驗證了 RAG 技術於架構層級設計建議任務中的應用潛力,亦補足現有研究多聚焦於語法層級問題的侷限,並提出一套結合 LLM 與 RAG 的可行設計審查流程架構,為軟體開發流程的智慧化與自動化奠定基礎。
Software design review is essential for ensuring architectural quality and system maintainability. However, in practice, it often relies on developers’ experience and subjective judgment, lacking consistency and standardization. With the growing application of large language models (LLMs) in software development, existing studies have explored their use in code quality analysis, yet primarily focus on function-level code smells. Little attention has been paid to architecture-level review, especially in adaptive scenarios involving evolving requirements.
This study investigates whether LLMs, combined with Retrieval-Augmented Generation (RAG), can produce high-quality refactoring suggestions for adapting system architectures to new demands. A cross-evaluation was conducted using three LLMs with different reasoning capabilities and three prompting workflows. Results show that RAG significantly improves output quality for lower-capacity models and helps medium-capacity models bridge knowledge gaps, enabling more structured and extensible recommendations.
The findings validate the potential of RAG in architecture-level design review and address the limitations of prior research that focus mainly on syntax-level issues. A practical LLM+RAG-based design review workflow is proposed, laying the groundwork for intelligent and automated support in software architecture evaluation.
[1] P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” Apr. 12, 2021, arXiv: arXiv:2005.11401. doi: 10.48550/arXiv.2005.11401.
[2] Y. H. Lai, yihui2001/LLM-RefactorTestSet. (May 15, 2025). Accessed: May 15, 2025. [Online]. Available: https://github.com/yihui2001/LLM-RefactorTestSet
[3] E. J. Weyuker, “Evaluating software complexity measures,” IIEEE Trans. Software Eng., vol. 14, no. 9, pp. 1357–1365, Sep. 1988, doi: 10.1109/32.6178.
[4] S. R. Chidamber and C. F. Kemerer, “A metrics suite for object oriented design,” IEEE Transactions on Software Engineering, vol. 20, no. 6, pp. 476–493, Jun. 1994, doi: 10.1109/32.295895.
[5] S. H. Kan, Metrics and Models in Software Quality Engineering. Addison-Wesley Professional, 2003.
[6] D. Singh, V. R. Sekar, K. T. Stolee, and B. Johnson, “Evaluating how static analysis tools can reduce code review effort,” in 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), Oct. 2017, pp. 101–105. doi: 10.1109/VLHCC.2017.8103456.
[7] “PMD.” Accessed: May 08, 2025. [Online]. Available: https://pmd.github.io/
[8] I. R. da S. Simões and E. Venson, “Evaluating Source Code Quality with Large Language Models: a comparative study,” Sep. 22, 2024, arXiv: arXiv:2408.07082. doi: 10.48550/arXiv.2408.07082.
[9] B. Liu, Y. Jiang, Y. Zhang, N. Niu, G. Li, and H. Liu, “An Empirical Study on the Potential of LLMs in Automated Software Refactoring,” Nov. 07, 2024, arXiv: arXiv:2411.04444. doi: 10.48550/arXiv.2411.04444.
[10] M. Rinard, “Software Engineering Research in a World with Generative Artificial Intelligence,” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon Portugal: ACM, May 2024, pp. 1–5. doi: 10.1145/3597503.3649399.
[11] J. Liu, J. Lin, and Y. Liu, “How Much Can RAG Help the Reasoning of LLM?,” Oct. 04, 2024, arXiv: arXiv:2410.02338. doi: 10.48550/arXiv.2410.02338.
[12] P. Sarthi, S. Abdullah, A. Tuli, S. Khanna, A. Goldie, and C. D. Manning, “RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval,” Jan. 31, 2024, arXiv: arXiv:2401.18059. doi: 10.48550/arXiv.2401.18059.
[13] L. Gao, X. Ma, J. Lin, and J. Callan, “Precise Zero-Shot Dense Retrieval without Relevance Labels,” Dec. 20, 2022, arXiv: arXiv:2212.10496. doi: 10.48550/arXiv.2212.10496.
[14] D. Zhou et al., “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models,” Apr. 16, 2023, arXiv: arXiv:2205.10625. doi: 10.48550/arXiv.2205.10625.
[15] H. Trivedi, N. Balasubramanian, T. Khot, and A. Sabharwal, “Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions,” Jun. 23, 2023, arXiv: arXiv:2212.10509. doi: 10.48550/arXiv.2212.10509.
[16] Refactoring: improving the design of existing code. USA: Addison-Wesley Longman Publishing Co., Inc., 1999.
[17] R. Wirfs-Brock, A. McKean, I. Jacobson, and J. Vlissides, Object Design: Roles, Responsibilities, and Collaborations. Pearson Education, 2002.
[18] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design patterns: elements of reusable object-oriented software. USA: Addison-Wesley Longman Publishing Co., Inc., 1995.
[19] E. Freeman, E. Freeman, B. Bates, and K. Sierra, Head First Design Patterns. O’ Reilly & Associates, Inc., 2004.
[20] E. Bache, emilybache/GildedRose-Refactoring-Kata. (May 01, 2025). XSLT. Accessed: May 01, 2025. [Online]. Available: https://github.com/emilybache/GildedRose-Refactoring-Kata
[21] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a Method for Automatic Evaluation of Machine Translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, P. Isabelle, E. Charniak, and D. Lin, Eds., Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, Jul. 2002, pp. 311–318. doi: 10.3115/1073083.1073135.
[22] C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries,” in Text Summarization Branches Out, Barcelona, Spain: Association for Computational Linguistics, Jul. 2004, pp. 74–81. Accessed: Jul. 28, 2025. [Online]. Available: https://aclanthology.org/W04-1013/
[23] A. Lavie and A. Agarwal, “Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments,” in Proceedings of the Second Workshop on Statistical Machine Translation, in StatMT ’07. USA: Association for Computational Linguistics, 23 2007, pp. 228–231.