跳到主要內容

簡易檢索 / 詳目顯示

研究生: 賴宜慧
Yi-Hui Lai
論文名稱: 運用LLM 與RAG 技術進行設計審查:架構重構建議之品質與效益分析
指導教授: 鄭永斌
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 47
中文關鍵詞: 設計審查軟體架構評估大型語言模型程式碼重構
外文關鍵詞: Design Review, Software Architecture Evaluation, Large Language Models (LLMs), Code Refactoring
相關次數: 點閱:33下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 軟體設計審查(Design Review)是確保架構品質與系統可維護性的關鍵環節,然實務上多依賴於開發者經驗與主觀判斷,缺乏一致性與標準化。隨著大型語言模型(LLMs)於程式開發輔助中的應用興盛,已有研究嘗試以其協助程式品質分析,惟多聚焦於函式層級的程式碼異味(code smell),對架構層級設計審查,特別是在需求擴充情境下的適應性分析與優化建議生成,仍所涉不多。
    本研究即以變更導向之架構層級設計審查為應用場景,評估 LLM 結合檢索增強生成技術(RAG),在審查既有系統架構應對新需求的能力時,是否能產出具品質的重構建議。實驗設計三種不同推理能力的語言模型與三種推理流程架構進行交叉測試。結果顯示, RAG 技術則能顯著提升低推理能力模型的生成品質,對中等推理能力模型亦有助於補足知識落差,促使其產出更具延伸性與結構性的重構建議。
    本研究驗證了 RAG 技術於架構層級設計建議任務中的應用潛力,亦補足現有研究多聚焦於語法層級問題的侷限,並提出一套結合 LLM 與 RAG 的可行設計審查流程架構,為軟體開發流程的智慧化與自動化奠定基礎。


    Software design review is essential for ensuring architectural quality and system maintainability. However, in practice, it often relies on developers’ experience and subjective judgment, lacking consistency and standardization. With the growing application of large language models (LLMs) in software development, existing studies have explored their use in code quality analysis, yet primarily focus on function-level code smells. Little attention has been paid to architecture-level review, especially in adaptive scenarios involving evolving requirements.
    This study investigates whether LLMs, combined with Retrieval-Augmented Generation (RAG), can produce high-quality refactoring suggestions for adapting system architectures to new demands. A cross-evaluation was conducted using three LLMs with different reasoning capabilities and three prompting workflows. Results show that RAG significantly improves output quality for lower-capacity models and helps medium-capacity models bridge knowledge gaps, enabling more structured and extensible recommendations.
    The findings validate the potential of RAG in architecture-level design review and address the limitations of prior research that focus mainly on syntax-level issues. A practical LLM+RAG-based design review workflow is proposed, laying the groundwork for intelligent and automated support in software architecture evaluation.

    摘要 i Abstract ii 目錄 iii 圖目錄 v 表目錄 vii 一、 緒論 1 二、 研究背景與相關技術 4 2-1 傳統軟體品質指標與靜態分析技術 4 2-2 大型語言模型於程式碼修正與架構分析應用 5 2-3 檢索增強生成 6 三、 研究方法 9 3-1 研究架構與實驗流程設計 9 3-2 Hybrid RAG實驗架構設計 12 3-2-1 向量資料庫建構與內容設計流程 12 3-2-2 Hybrid RAG系統處理流程與任務拆解策略 13 3-3 輸入資料格式與案例準備 17 3-3-3 Movie 案例(教科書來源) 17 3-3-4 E-book案例(關鍵字改寫版本) 17 3-3-5 GildedRose案例(GitHub 公開重構案例) 18 3-3-6 Wood 案例(關鍵字改寫版本) 18 3-3-7 Order 案例(原創資料庫設計概念延伸) 18 3-3-8 Scene 案例(Unity 專案原始碼) 19 3-4 生成結果之評估機制與分析方法 19 四、 結果與分析 21 4-1 模型推理能力是穩定設計建議的關鍵 22 4-2 RAG 有助提升低推理能力模型的重構建議品質 22 4-3 RAG 對中高推論模型的生成品質具雙刃效果 28 4-4 RAG 在特定任務中效果受限 30 五、 結論與未來展望 33 六、 研究限制與後續研究方向 35 七、 參考資料 36

    [1] P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” Apr. 12, 2021, arXiv: arXiv:2005.11401. doi: 10.48550/arXiv.2005.11401.
    [2] Y. H. Lai, yihui2001/LLM-RefactorTestSet. (May 15, 2025). Accessed: May 15, 2025. [Online]. Available: https://github.com/yihui2001/LLM-RefactorTestSet
    [3] E. J. Weyuker, “Evaluating software complexity measures,” IIEEE Trans. Software Eng., vol. 14, no. 9, pp. 1357–1365, Sep. 1988, doi: 10.1109/32.6178.
    [4] S. R. Chidamber and C. F. Kemerer, “A metrics suite for object oriented design,” IEEE Transactions on Software Engineering, vol. 20, no. 6, pp. 476–493, Jun. 1994, doi: 10.1109/32.295895.
    [5] S. H. Kan, Metrics and Models in Software Quality Engineering. Addison-Wesley Professional, 2003.
    [6] D. Singh, V. R. Sekar, K. T. Stolee, and B. Johnson, “Evaluating how static analysis tools can reduce code review effort,” in 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), Oct. 2017, pp. 101–105. doi: 10.1109/VLHCC.2017.8103456.
    [7] “PMD.” Accessed: May 08, 2025. [Online]. Available: https://pmd.github.io/
    [8] I. R. da S. Simões and E. Venson, “Evaluating Source Code Quality with Large Language Models: a comparative study,” Sep. 22, 2024, arXiv: arXiv:2408.07082. doi: 10.48550/arXiv.2408.07082.
    [9] B. Liu, Y. Jiang, Y. Zhang, N. Niu, G. Li, and H. Liu, “An Empirical Study on the Potential of LLMs in Automated Software Refactoring,” Nov. 07, 2024, arXiv: arXiv:2411.04444. doi: 10.48550/arXiv.2411.04444.
    [10] M. Rinard, “Software Engineering Research in a World with Generative Artificial Intelligence,” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon Portugal: ACM, May 2024, pp. 1–5. doi: 10.1145/3597503.3649399.
    [11] J. Liu, J. Lin, and Y. Liu, “How Much Can RAG Help the Reasoning of LLM?,” Oct. 04, 2024, arXiv: arXiv:2410.02338. doi: 10.48550/arXiv.2410.02338.
    [12] P. Sarthi, S. Abdullah, A. Tuli, S. Khanna, A. Goldie, and C. D. Manning, “RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval,” Jan. 31, 2024, arXiv: arXiv:2401.18059. doi: 10.48550/arXiv.2401.18059.
    [13] L. Gao, X. Ma, J. Lin, and J. Callan, “Precise Zero-Shot Dense Retrieval without Relevance Labels,” Dec. 20, 2022, arXiv: arXiv:2212.10496. doi: 10.48550/arXiv.2212.10496.
    [14] D. Zhou et al., “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models,” Apr. 16, 2023, arXiv: arXiv:2205.10625. doi: 10.48550/arXiv.2205.10625.
    [15] H. Trivedi, N. Balasubramanian, T. Khot, and A. Sabharwal, “Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions,” Jun. 23, 2023, arXiv: arXiv:2212.10509. doi: 10.48550/arXiv.2212.10509.
    [16] Refactoring: improving the design of existing code. USA: Addison-Wesley Longman Publishing Co., Inc., 1999.
    [17] R. Wirfs-Brock, A. McKean, I. Jacobson, and J. Vlissides, Object Design: Roles, Responsibilities, and Collaborations. Pearson Education, 2002.
    [18] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design patterns: elements of reusable object-oriented software. USA: Addison-Wesley Longman Publishing Co., Inc., 1995.
    [19] E. Freeman, E. Freeman, B. Bates, and K. Sierra, Head First Design Patterns. O’ Reilly & Associates, Inc., 2004.
    [20] E. Bache, emilybache/GildedRose-Refactoring-Kata. (May 01, 2025). XSLT. Accessed: May 01, 2025. [Online]. Available: https://github.com/emilybache/GildedRose-Refactoring-Kata
    [21] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a Method for Automatic Evaluation of Machine Translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, P. Isabelle, E. Charniak, and D. Lin, Eds., Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, Jul. 2002, pp. 311–318. doi: 10.3115/1073083.1073135.
    [22] C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries,” in Text Summarization Branches Out, Barcelona, Spain: Association for Computational Linguistics, Jul. 2004, pp. 74–81. Accessed: Jul. 28, 2025. [Online]. Available: https://aclanthology.org/W04-1013/
    [23] A. Lavie and A. Agarwal, “Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments,” in Proceedings of the Second Workshop on Statistical Machine Translation, in StatMT ’07. USA: Association for Computational Linguistics, 23 2007, pp. 228–231.

    QR CODE
    :::