| 研究生: |
楊斌翔 Pin-Hsiang Yang |
|---|---|
| 論文名稱: |
Geekynotes 影音說明文件之 AI 自動化資訊萃取、匯入與標註 AI-based Automated Information Extraction, Import, and Annotation for Geekynotes Video Documentation |
| 指導教授: |
鄭永斌
Yung-Pin Cheng |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 軟體工程研究所 Graduate Institute of Software Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 61 |
| 中文關鍵詞: | 技術文件管理 、知識傳承 、語音轉文字(STT) 、大型語言模型(LLM) 、光學字元辨識(OCR) |
| 外文關鍵詞: | Software Documentation, Knowledge Transfer, Speech-to-Text (STT), Large Language Model (LLM), Optical Character Recognition (OCR) |
| 相關次數: | 點閱:119 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Geekynotes 是一款用於解決知識傳承問題的工具,其目標是將開發過程中產生的文件,諸如圖片、影片、簡報…等集合於一處,並且透過此工具特有的功能「標籤(Label)」將這些文件與程式碼進行連結,方便後續的使用者查詢與管理,其中能將影片與程式碼產生雙向連結的 Label 就是 Video Label。
透過錄製影片解說程式碼來取代傳統藉由文字說明程式碼的方式能更清楚和快速的讓觀看者了解當初開發的目的。然而,錄製影片的方式也衍伸出三個問題:影片過長觀看者無法抓住影片重點且無法搜尋、影片的語言侷限於錄製者所使用的語言,以及影片說明與程式碼連結不夠精確。
因此,本論文提出 AI Video Enhancement 系統,整合了語音轉錄(Speech-to-Text, STT)、大型語言模型(Large Language Model, LLM)與光學字元辨識(Optical Character Recognition, OCR)技術,將影片轉換為結構化的多語言文字資料,並自動化分析並建立影片片段與程式碼檔案間的精確對應關係。
本研究有效解決了影片文件在知識傳承中的根本性限制。透過將非結構化的影片媒體轉換為可檢索、可互動的學習資源,顯著提升了開發者獲取與理解技術資訊的效率,為軟體開發中的知識管理與文件維護提供了創新的技術方案。
Geekynotes is a tool designed to address the problem of knowledge transfer in software development. Its goal is to consolidate various types of documentation—such as images, videos, and presentations—generated during the development process into a single platform. Through its unique feature called Label, these materials can be linked to specific parts of the source code, making it easier for future developers to retrieve and manage relevant information. Among these, the Video Label feature enables the creation of bidirectional links between videos and source code.
Replacing traditional text-based code explanations with recorded video walkthroughs can often convey development intent more clearly and efficiently. However, relying on videos also introduces three key challenges: long videos make it difficult for viewers to identify key points and lack searchability, the language used in videos is limited to the speaker's language, and the linkage between video explanations and actual code is often imprecise.
To address these issues, this thesis proposes an AI Video Enhancement system that innovatively integrates Speech-to-Text (STT), Large Language Model (LLM), and Optical Character Recognition (OCR) technologies to transform videos into structured multilingual textual data. The system automatically analyzes content to establish precise correspondences between video segments and source code files.
This research effectively addresses fundamental limitations of video documentation in knowledge transfer. By transforming unstructured video media into searchable and interactive learning resources, it significantly improves developers' efficiency in acquiring and understanding technical information, providing an innovative technological solution for knowledge management and documentation maintenance in modern software development.
[1] C. Treude, J. Middleton, and T. Atapattu, "Beyond accuracy: assessing software documentation quality," Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020.
[2] E. Aghajani et al., "Software Documentation Issues Unveiled," in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 25–31 May 2019 2019, pp. 1199–1210, doi: 10.1109/ICSE.2019.00122.
[3] L. MacLeod, M. A. Storey, and A. Bergen, "Code, Camera, Action: How Software Developers Document and Share Program Knowledge Using YouTube," in 2015 IEEE 23rd International Conference on Program Comprehension, 18–19 May 2015 2015, pp. 104–114, doi: 10.1109/ICPC.2015.19.
[4] O. Karras, "Software Professionals’ Attitudes Towards Video as a Medium in Requirements Engineering," in Product-Focused Software Process Improvement, Cham, M. Kuhrmann et al., Eds., 2018// 2018: Springer International Publishing, pp. 150–158.
[5] Y. P. Cheng, W. N. Hsiung, Y. S. Wu, and L. H. Chen, "GeekyNote: A Technical Documentation Tool with Coverage, Backtracking, Traces, and Couplings," in 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 5–11 Oct. 2020 2020, pp. 73–76.
[6] 雷曼亞, "Enhancing Code Documentation Efficiency: A Documentation-on-the-Fly Approach on GeekyNotes Web," 碩士, 資訊工程學系, 國立中央大學, 桃園縣, 2024. [Online]. Available: https://hdl.handle.net/11296/uwc69b
[7] A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, "Robust Speech Recognition via Large-Scale Weak Supervision," p. arXiv:2212.04356doi: 10.48550/arXiv.2212.04356.
[8] SYSTRAN. "faster-whisper." https://github.com/SYSTRAN/faster-whisper (accessed.
[9] Ollama. "Ollama." https://github.com/ollama/ollama (accessed.
[10] Y. Lu, J. Yang, Y. Shen, and A. Awadallah, "OmniParser for Pure Vision Based GUI Agent," p. arXiv:2408.00203doi: 10.48550/arXiv.2408.00203.
[11] Y. Du et al., "PP-OCR: A Practical Ultra Lightweight OCR System," ArXiv, vol. abs/2009.09941, 2020.
[12] L. R. Dice, "Measures of the Amount of Ecologic Association Between Species," Ecology, vol. 26, no. 3, pp. 297–302, 1945, doi: https://doi.org/10.2307/1932409.
[13] T. Sørensen, "A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons.," Biologiske Skrifter / Kongelige Danske Videnskabernes Selskab, vol. 5: 1-34, 1948.
[14] G. Kondrak, D. Marcu, and K. Knight, "Cognates Can Improve Statistical Translation Models," 2003, in Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers, pp. 46–48. [Online]. Available: https://aclanthology.org/N03-2016/. [Online]. Available: https://aclanthology.org/N03-2016/