以 API 層級相依與控制器導向執行機制實現雲原生機器學習流程之可追溯性

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳鵬升 Peng-Sheng Chen
論文名稱：	以 API 層級相依與控制器導向執行機制實現雲原生機器學習流程之可追溯性 Enabling Traceability in Cloud-Native Machine Learning Workflows with API-Level Dependencies and Controller-Based Execution
指導教授：	王尉任 Wei-Jen Wang
口試委員:
學位類別：	碩士 Master
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2025
畢業學年度：	113
語文別：	英文
論文頁數：	117
中文關鍵詞：	工作流程可追蹤性、工作流程管線、微服務架構、Kubernetes 、Airflow 、Kafka
外文關鍵詞：	Workflow Traceability, MLOps, Workflow Pipeline, Kubernetes, MicroService, Airflow
相關次數：	點閱：29 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

當機器學習（ML）工作流程日益複雜，橫跨異質環境、工具鏈與執行脈絡時，輸入
、設定與輸出之間缺乏可靠的關聯性，已成為一項根本性挑戰。若無系統化的可追蹤性
，工作流程容易出現不一致性，造成除錯困難、協作效率低落，以及維護成本上升。
本論文聚焦於一個模組化編排框架的架構設計與實作，旨在於整個機器學習工作流
程生命週期中，系統性地實現可追溯性。
該系統透過以下五個整合面向來增強追溯能力：
1.版本控制的資源管理：涵蓋資料集、程式碼、執行環境與模型；
2.基於 API 的任務抽象進行自動化工作流程整合；
3.工作流程結果紀錄，輸入對應與中繼資料追蹤；
4.集中式的任務層級日誌紀錄；
5.統一的執行時元件日誌紀錄。
這些機制共同支援工作流程執行脈絡的系統化記錄、關聯與檢視，從而在異質環境
中，於機器學習工作流程的各階段實現一致且可驗證的可追溯性。

As machine learning (ML) workflows grow in complexity—spanning heterogeneous
environments, toolchains, and execution contexts—the lack of reliable associations between
inputs, configurations, and outputs has emerged as a fundamental challenge. Without
systematic traceability, workflows become prone to inconsistencies, hindered debugging,
collaboration inefficiencies, and increased maintenance overhead. This thesis focuses on the
architectural design and implementation of a modular orchestration framework to
systematically enable traceability throughout the ML workflow lifecycle. The system enhances
traceability through five integrated dimensions: (1) version-controlled resource management
for datasets, code, environments, and models; (2) automated workflow integration via APIbased task abstraction; (3) structured input-output association and metadata tracking; (4)
centralized task-level logging; and (5) unified runtime component logging. These mechanisms
collectively support the systematic recording, correlation, and inspection of workflow
execution contexts—thereby enabling consistent and verifiable traceability across all stages of
ML workflows in heterogeneous environments.

中文摘要    i
English Abstract    ii
List of Figures    vi
List of Tables    viii
Chapter 1. Introduction    1
1-1 Research Background    1
1-2 Research Motivation and Objectives    5
1-2-1 Research Motivation    5
1-2-2 Research Objectives    7
1-3 Contribution    9
1-4 Thesis Structure    10
Chapter 2. Background Knowledge    11
2-1 MLOps    11
2-2 Kubernetes    12
2-3 Airflow    17
2-4 Kafka    21
2-5 ELK-Stack    25
2-6 MLflow    28
2-7 MiniO    31
2-8 Harbor    35
2-9 GitLAB/GitHub    38
2-10 Role of Technologies in the Proposed System Architecture    43
Chapter 3. Related work    46
3-1 Dynamic Tracking, MLOps, and Workflow Integration: Enabling Transparent Reproducibility in Machine Learning    46
3-2 A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows    47
3-3 Discussion    48
Chapter 4. System Architecture Design    49
4-1 System Architecture    49
4-2 Pipeline definition    57
4-3 Pipeline Development Process.    61
4-4 Pipeline Deployment Process    63
4-5 Traceability in System Architecture    65
Chapter 5. Case Study    75
5-1 NCU-RSS Model Training Workflow    77
5-1-1 Pipeline Goal / Context    77
5-1-2 Pipeline Stage Design    78
5-1-3 Component Execution Forms    84
5-2 MoCo Clustering Training Workflow    84
5-2-1 Pipeline Goal / Context    84
5-2-2 Pipeline Stage Design    85
5-2-3 Component Execution Forms    91
5-3 MoCo Clustering Testing Workflow    92
5-3-1 Pipeline Goal / Context    92
5-3-2 Pipeline Stage Design    93
5-3-3 Component Execution Forms    98
Chapter 6. Conclusion and Future Work    100
6-1 Conclusion    100
6-2 Future Work    102
Reference    103

                                

[1] C. Althati, M. Tomar, and L. Shanmugam, "Enhancing Data Integration and Management: The Role of AI and Machine Learning in Modern Data Platforms," Journal of Artificial Intelligence General science (JAIGS) ISSN:3006-4023, vol. 2, no. 1, pp. 220-232, 02/22 2024, doi: 10.60087/jaigs.v2i1.154.
[2] W. Brewer, A. Gainaru, F. Suter, F. Wang, M. Emani, and S. Jha, "AI-coupled HPC workflow applications, middleware and performance," arXiv preprint arXiv:2406.14315, 2024.
[3] S. Mishra, A. Rao, R. Krishnan, B. Ayyub, A. Aria, and E. Zio, "Reliability, Resilience and Human Factors Engineering for Trustworthy AI Systems," arXiv preprint arXiv:2411.08981, 2024.
[4] H. Igwe, "The Significance of Automating the Integration of Security and Infrastructure as Code in Software Development Life Cycle," Purdue University, 2024.
[5] A. H. Adepoju, B. Austin-Gabriel, A. Eweje, and O. Hamza, "A data governance framework for high-impact programs: Reducing redundancy and enhancing data quality at scale," Int J Multidiscip Res Growth Eval, vol. 4, no. 6, pp. 1141-1154, 2023.
[6] Y. Liu et al., "Blockchain-enabled platform-as-a-service for production management in off-site construction design using openBIM standards," Automation in Construction, vol. 164, p. 105447, 2024.
[7] V. Goar and N. S. Yadav, "Foundations of machine learning," in Intelligent Optimization Techniques for Business Analytics: IGI Global, 2024, pp. 25-48.
[8] M. He, X. Wang, P. Wei, L. Yang, Y. Teng, and R. Lyu, "Reinforcement learning meets network intrusion detection: A transferable and adaptable framework for anomaly behavior identification," IEEE Transactions on Network and Service Management, vol. 21, no. 2, pp. 2477-2492, 2024.
[9] N. Sherje, "Enhancing software development efficiency through AI-powered code generation," Research Journal of Computer Systems and Engineering, vol. 5, no. 1, pp. 01-12, 2024.
[10] J. Latendresse, S. Abedu, A. Abdellatif, and E. Shihab, "An Exploratory Study on Machine Learning Model Management," ACM Transactions on Software Engineering and Methodology, vol. 34, no. 1, pp. 1-31, 2024.
[11] M. A. Khan et al., "Machine learning-based test case prioritization using hyperparameter optimization," in Proceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024), 2024, pp. 125-135.
[12] E. e Oliveira, M. Rodrigues, J. P. Pereira, A. M. Lopes, I. I. Mestric, and S. Bjelogrlic, "Unlabeled learning algorithms and operations: overview and future trends in defense sector," Artificial Intelligence Review, vol. 57, no. 3, p. 66, 2024.
[13] R. F. Da Silva et al., "Workflows community summit 2024: Future trends and challenges in scientific workflows," arXiv preprint arXiv:2410.14943, 2024.
[14] K. Shivashankar, G. S. A. Hajj, and A. Martini, "Scalability and Maintainability Challenges and Solutions in Machine Learning: Systematic Literature Review," arXiv preprint arXiv:2504.11079, 2025.
[15] P. L. Foalem, F. Khomh, and H. Li, "Studying logging practice in machine learning-based applications," Information and Software Technology, vol. 170, p. 107450, 2024.
[16] Microsoft. "MLOps maturity model for production ML." Microsoft Azure Architecture Center. https://learn.microsoft.com/en-us/azure/architecture/example-scenario/mlops/mlops-maturity-model (accessed 2025).
[17] kubernetes. "Kubernetes Documentation." [online], The Linux Foundation.　Available:　 https://kubernetes.io/docs/concepts/ (accessed 2025).
[18] Apache Software Foundation. ""[online], Airflow. Available: https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/index.html (accessed 2025).
[19] Apache Software Foundation. "DOCUMENTATION."[online], Kafka. Available: https://kafka.apache.org/documentation/#gettingStarted (accessed 2025).
[20] Elastic. "The Elastic Stack." [online], Elastic. Available: https://www.elastic.co/docs/get-started/the-stack (accessed 2025).
[21] MLflow. " MLflow Document. " [online], MLflow Project. Available: https://www.mlflow.org/docs/1.29.0/index.html (accessed 2025).
[22] MinIO. "MinIO Documentation." [online], MinIO. Available: https://min.io/docs/minio/kubernetes/upstream/index.html (accessed 2025).
[23] Harbor Authors 2025. "Harbor 2.4 Documentation." [online], The Linux Foundation. Available: https://goharbor.io/docs/2.4.0/ (accessed 2025).
[24] GitHub, "GitHub Docs." [online], GitHub. Available: https://docs.github.com/en (accessed 2025).
[25] GitLab, " Docs." [online], GitLab. Available:https://docs.gitlab.com/ (accessed 2025).
[26] H. Safri, G. Papadimitriou, and E. Deelman, "Dynamic Tracking, MLOps, and Workflow Integration: Enabling Transparent Reproducibility in Machine Learning," in 2024 IEEE 20th International Conference on e-Science (e-Science), 16-20 Sept. 2024 2024, pp. 1-10, doi: 10.1109/e-Science62913.2024.10678658.
[27] N. Hoffmann and N. E. Pour, "A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows," in 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), 8-12 July 2024 2024, pp. 567-573, doi: 10.1109/EuroSPW61312.2024.00092.

簡易檢索 / 詳目顯示

相關論文