Geekynotes 影音說明文件之 AI 自動化資訊萃取、匯入與標註

簡易檢索 / 詳目顯示

回結果列表

研究生：	楊斌翔 Pin-Hsiang Yang
論文名稱：	Geekynotes 影音說明文件之 AI 自動化資訊萃取、匯入與標註 AI-based Automated Information Extraction, Import, and Annotation for Geekynotes Video Documentation
指導教授：	鄭永斌 Yung-Pin Cheng
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 軟體工程研究所 Graduate Institute of Software Engineering
論文出版年：	2025
畢業學年度：	113
語文別：	中文
論文頁數：	61
中文關鍵詞：	技術文件管理、知識傳承、語音轉文字（STT）、大型語言模型（LLM）、光學字元辨識（OCR）
外文關鍵詞：	Software Documentation, Knowledge Transfer, Speech-to-Text (STT), Large Language Model (LLM), Optical Character Recognition (OCR)
相關次數：	點閱：119 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

Geekynotes 是一款用於解決知識傳承問題的工具，其目標是將開發過程中產生的文件，諸如圖片、影片、簡報…等集合於一處，並且透過此工具特有的功能「標籤（Label）」將這些文件與程式碼進行連結，方便後續的使用者查詢與管理，其中能將影片與程式碼產生雙向連結的 Label 就是 Video Label。
透過錄製影片解說程式碼來取代傳統藉由文字說明程式碼的方式能更清楚和快速的讓觀看者了解當初開發的目的。然而，錄製影片的方式也衍伸出三個問題：影片過長觀看者無法抓住影片重點且無法搜尋、影片的語言侷限於錄製者所使用的語言，以及影片說明與程式碼連結不夠精確。
因此，本論文提出 AI Video Enhancement 系統，整合了語音轉錄（Speech-to-Text, STT）、大型語言模型（Large Language Model, LLM）與光學字元辨識（Optical Character Recognition, OCR）技術，將影片轉換為結構化的多語言文字資料，並自動化分析並建立影片片段與程式碼檔案間的精確對應關係。
本研究有效解決了影片文件在知識傳承中的根本性限制。透過將非結構化的影片媒體轉換為可檢索、可互動的學習資源，顯著提升了開發者獲取與理解技術資訊的效率，為軟體開發中的知識管理與文件維護提供了創新的技術方案。

Geekynotes is a tool designed to address the problem of knowledge transfer in software development. Its goal is to consolidate various types of documentation—such as images, videos, and presentations—generated during the development process into a single platform. Through its unique feature called Label, these materials can be linked to specific parts of the source code, making it easier for future developers to retrieve and manage relevant information. Among these, the Video Label feature enables the creation of bidirectional links between videos and source code.
Replacing traditional text-based code explanations with recorded video walkthroughs can often convey development intent more clearly and efficiently. However, relying on videos also introduces three key challenges: long videos make it difficult for viewers to identify key points and lack searchability, the language used in videos is limited to the speaker's language, and the linkage between video explanations and actual code is often imprecise.
To address these issues, this thesis proposes an AI Video Enhancement system that innovatively integrates Speech-to-Text (STT), Large Language Model (LLM), and Optical Character Recognition (OCR) technologies to transform videos into structured multilingual textual data. The system automatically analyzes content to establish precise correspondences between video segments and source code files.
This research effectively addresses fundamental limitations of video documentation in knowledge transfer. By transforming unstructured video media into searchable and interactive learning resources, it significantly improves developers' efficiency in acquiring and understanding technical information, providing an innovative technological solution for knowledge management and documentation maintenance in modern software development.

摘要    i
Abstract    ii
目錄    iv
圖目錄    vi
表目錄    viii
一、    緒論    1
二、    研究背景與相關技術    5
2-1    傳統文件與多媒體文件之比較    5
2-1-1    傳統文件的限制    6
2-1-2    多媒體文件的優勢    6
2-1-3    影片文件的應用與挑戰    7
2-2    Geekynotes    8
2-2-1    GitLab 整合與合併請求標準    8
2-2-2    標籤功能    12
2-2-3    影片標籤    12
2-3    相關工具與技術    14
2-3-1    Whisper 與 Faster Whisper    14
2-3-2    Ollama 平台與大型語言模型    14
2-3-3    OmniParser 與 PaddleOCR 整合    15
2-3-4    字串相似度與 Dice-Sørensen Coefficient    16
三、    影片文件的挑戰與解決方案    18
3-1    影片文件帶來的問題    20
3-1-1    影片的語言侷限性    20
3-1-2    影片的搜尋性與內容定位困難    20
3-1-3    影片與程式碼的直接連結    21
3-2    Geekynotes 影片標籤功能增強    21
3-2-1    影片內容可搜尋性    21
3-2-2    多語言字幕    22
3-2-3    影片摘要總結    22
3-2-4    影片與程式碼檔案的連結    22
3-3    AI Video Enhancement 的應用場景    23
3-3-1    影片摘要區塊    23
3-3-2    互動式字幕區塊    25
四、    系統架構與設計    28
4-1    系統概觀    28
4-2    系統流程說明    30
4-2-1    Webhook 處理流程    31
4-2-2    語音轉文字字幕生成    31
4-2-3    語意摘要與翻譯流程    31
4-2-4    畫面內容擷取    32
4-2-5    資料整合與儲存    32
4-3    系統模組實作與技術整合    32
4-3-1    Webhook 與影片接收模組    33
4-3-2    語音辨識模組實作    34
4-3-3    語意摘要與字幕翻譯模組    35
4-3-4    OmniParser 影片對應程式碼模組    35
4-3-5    資料整合與儲存    39
4-4    檔案路徑搜尋優化    40
4-5    總結與回顧    41
五、    研究功能分析與未來發展方向    43
5-1    功能改善分析    43
5-2    研究限制    44
5-3    未來研究方向    46
六、    結論    47
七、    參考資料    48
                                

[1] C. Treude, J. Middleton, and T. Atapattu, "Beyond accuracy: assessing software documentation quality," Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020.
[2] E. Aghajani et al., "Software Documentation Issues Unveiled," in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 25–31 May 2019 2019, pp. 1199–1210, doi: 10.1109/ICSE.2019.00122.
[3] L. MacLeod, M. A. Storey, and A. Bergen, "Code, Camera, Action: How Software Developers Document and Share Program Knowledge Using YouTube," in 2015 IEEE 23rd International Conference on Program Comprehension, 18–19 May 2015 2015, pp. 104–114, doi: 10.1109/ICPC.2015.19.
[4] O. Karras, "Software Professionals’ Attitudes Towards Video as a Medium in Requirements Engineering," in Product-Focused Software Process Improvement, Cham, M. Kuhrmann et al., Eds., 2018// 2018: Springer International Publishing, pp. 150–158.
[5] Y. P. Cheng, W. N. Hsiung, Y. S. Wu, and L. H. Chen, "GeekyNote: A Technical Documentation Tool with Coverage, Backtracking, Traces, and Couplings," in 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 5–11 Oct. 2020 2020, pp. 73–76.
[6] 雷曼亞, "Enhancing Code Documentation Efficiency: A Documentation-on-the-Fly Approach on GeekyNotes Web," 碩士, 資訊工程學系, 國立中央大學, 桃園縣, 2024. [Online]. Available: https://hdl.handle.net/11296/uwc69b
[7] A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, "Robust Speech Recognition via Large-Scale Weak Supervision," p. arXiv:2212.04356doi: 10.48550/arXiv.2212.04356.
[8] SYSTRAN. "faster-whisper." https://github.com/SYSTRAN/faster-whisper (accessed.
[9] Ollama. "Ollama." https://github.com/ollama/ollama (accessed.
[10] Y. Lu, J. Yang, Y. Shen, and A. Awadallah, "OmniParser for Pure Vision Based GUI Agent," p. arXiv:2408.00203doi: 10.48550/arXiv.2408.00203.
[11] Y. Du et al., "PP-OCR: A Practical Ultra Lightweight OCR System," ArXiv, vol. abs/2009.09941, 2020.
[12] L. R. Dice, "Measures of the Amount of Ecologic Association Between Species," Ecology, vol. 26, no. 3, pp. 297–302, 1945, doi: https://doi.org/10.2307/1932409.
[13] T. Sørensen, "A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons.," Biologiske Skrifter / Kongelige Danske Videnskabernes Selskab, vol. 5: 1-34, 1948.
[14] G. Kondrak, D. Marcu, and K. Knight, "Cognates Can Improve Statistical Translation Models," 2003, in Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers, pp. 46–48. [Online]. Available: https://aclanthology.org/N03-2016/. [Online]. Available: https://aclanthology.org/N03-2016/

簡易檢索 / 詳目顯示

相關論文