跳到主要內容

簡易檢索 / 詳目顯示

研究生: 陳鵬升
Peng-Sheng Chen
論文名稱: 以 API 層級相依與控制器導向執行機制實現雲原生機器學習流程之可追溯性
Enabling Traceability in Cloud-Native Machine Learning Workflows with API-Level Dependencies and Controller-Based Execution
指導教授: 王尉任
Wei-Jen Wang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 117
中文關鍵詞: 工作流程可追蹤性工作流程管線微服務架構KubernetesAirflowKafka
外文關鍵詞: Workflow Traceability, MLOps, Workflow Pipeline, Kubernetes, MicroService, Airflow
相關次數: 點閱:29下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 當機器學習(ML)工作流程日益複雜,橫跨異質環境、工具鏈與執行脈絡時,輸入
    、設定與輸出之間缺乏可靠的關聯性,已成為一項根本性挑戰。若無系統化的可追蹤性
    ,工作流程容易出現不一致性,造成除錯困難、協作效率低落,以及維護成本上升。
    本論文聚焦於一個模組化編排框架的架構設計與實作,旨在於整個機器學習工作流
    程生命週期中,系統性地實現可追溯性。
    該系統透過以下五個整合面向來增強追溯能力:
    1.版本控制的資源管理:涵蓋資料集、程式碼、執行環境與模型;
    2.基於 API 的任務抽象進行自動化工作流程整合;
    3.工作流程結果紀錄,輸入對應與中繼資料追蹤;
    4.集中式的任務層級日誌紀錄;
    5.統一的執行時元件日誌紀錄。
    這些機制共同支援工作流程執行脈絡的系統化記錄、關聯與檢視,從而在異質環境
    中,於機器學習工作流程的各階段實現一致且可驗證的可追溯性。


    As machine learning (ML) workflows grow in complexity—spanning heterogeneous
    environments, toolchains, and execution contexts—the lack of reliable associations between
    inputs, configurations, and outputs has emerged as a fundamental challenge. Without
    systematic traceability, workflows become prone to inconsistencies, hindered debugging,
    collaboration inefficiencies, and increased maintenance overhead. This thesis focuses on the
    architectural design and implementation of a modular orchestration framework to
    systematically enable traceability throughout the ML workflow lifecycle. The system enhances
    traceability through five integrated dimensions: (1) version-controlled resource management
    for datasets, code, environments, and models; (2) automated workflow integration via APIbased task abstraction; (3) structured input-output association and metadata tracking; (4)
    centralized task-level logging; and (5) unified runtime component logging. These mechanisms
    collectively support the systematic recording, correlation, and inspection of workflow
    execution contexts—thereby enabling consistent and verifiable traceability across all stages of
    ML workflows in heterogeneous environments.

    中文摘要 i English Abstract ii List of Figures vi List of Tables viii Chapter 1. Introduction 1 1-1 Research Background 1 1-2 Research Motivation and Objectives 5 1-2-1 Research Motivation 5 1-2-2 Research Objectives 7 1-3 Contribution 9 1-4 Thesis Structure 10 Chapter 2. Background Knowledge 11 2-1 MLOps 11 2-2 Kubernetes 12 2-3 Airflow 17 2-4 Kafka 21 2-5 ELK-Stack 25 2-6 MLflow 28 2-7 MiniO 31 2-8 Harbor 35 2-9 GitLAB/GitHub 38 2-10 Role of Technologies in the Proposed System Architecture 43 Chapter 3. Related work 46 3-1 Dynamic Tracking, MLOps, and Workflow Integration: Enabling Transparent Reproducibility in Machine Learning 46 3-2 A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows 47 3-3 Discussion 48 Chapter 4. System Architecture Design 49 4-1 System Architecture 49 4-2 Pipeline definition 57 4-3 Pipeline Development Process. 61 4-4 Pipeline Deployment Process 63 4-5 Traceability in System Architecture 65 Chapter 5. Case Study 75 5-1 NCU-RSS Model Training Workflow 77 5-1-1 Pipeline Goal / Context 77 5-1-2 Pipeline Stage Design 78 5-1-3 Component Execution Forms 84 5-2 MoCo Clustering Training Workflow 84 5-2-1 Pipeline Goal / Context 84 5-2-2 Pipeline Stage Design 85 5-2-3 Component Execution Forms 91 5-3 MoCo Clustering Testing Workflow 92 5-3-1 Pipeline Goal / Context 92 5-3-2 Pipeline Stage Design 93 5-3-3 Component Execution Forms 98 Chapter 6. Conclusion and Future Work 100 6-1 Conclusion 100 6-2 Future Work 102 Reference 103

    [1] C. Althati, M. Tomar, and L. Shanmugam, "Enhancing Data Integration and Management: The Role of AI and Machine Learning in Modern Data Platforms," Journal of Artificial Intelligence General science (JAIGS) ISSN:3006-4023, vol. 2, no. 1, pp. 220-232, 02/22 2024, doi: 10.60087/jaigs.v2i1.154.
    [2] W. Brewer, A. Gainaru, F. Suter, F. Wang, M. Emani, and S. Jha, "AI-coupled HPC workflow applications, middleware and performance," arXiv preprint arXiv:2406.14315, 2024.
    [3] S. Mishra, A. Rao, R. Krishnan, B. Ayyub, A. Aria, and E. Zio, "Reliability, Resilience and Human Factors Engineering for Trustworthy AI Systems," arXiv preprint arXiv:2411.08981, 2024.
    [4] H. Igwe, "The Significance of Automating the Integration of Security and Infrastructure as Code in Software Development Life Cycle," Purdue University, 2024.
    [5] A. H. Adepoju, B. Austin-Gabriel, A. Eweje, and O. Hamza, "A data governance framework for high-impact programs: Reducing redundancy and enhancing data quality at scale," Int J Multidiscip Res Growth Eval, vol. 4, no. 6, pp. 1141-1154, 2023.
    [6] Y. Liu et al., "Blockchain-enabled platform-as-a-service for production management in off-site construction design using openBIM standards," Automation in Construction, vol. 164, p. 105447, 2024.
    [7] V. Goar and N. S. Yadav, "Foundations of machine learning," in Intelligent Optimization Techniques for Business Analytics: IGI Global, 2024, pp. 25-48.
    [8] M. He, X. Wang, P. Wei, L. Yang, Y. Teng, and R. Lyu, "Reinforcement learning meets network intrusion detection: A transferable and adaptable framework for anomaly behavior identification," IEEE Transactions on Network and Service Management, vol. 21, no. 2, pp. 2477-2492, 2024.
    [9] N. Sherje, "Enhancing software development efficiency through AI-powered code generation," Research Journal of Computer Systems and Engineering, vol. 5, no. 1, pp. 01-12, 2024.
    [10] J. Latendresse, S. Abedu, A. Abdellatif, and E. Shihab, "An Exploratory Study on Machine Learning Model Management," ACM Transactions on Software Engineering and Methodology, vol. 34, no. 1, pp. 1-31, 2024.
    [11] M. A. Khan et al., "Machine learning-based test case prioritization using hyperparameter optimization," in Proceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024), 2024, pp. 125-135.
    [12] E. e Oliveira, M. Rodrigues, J. P. Pereira, A. M. Lopes, I. I. Mestric, and S. Bjelogrlic, "Unlabeled learning algorithms and operations: overview and future trends in defense sector," Artificial Intelligence Review, vol. 57, no. 3, p. 66, 2024.
    [13] R. F. Da Silva et al., "Workflows community summit 2024: Future trends and challenges in scientific workflows," arXiv preprint arXiv:2410.14943, 2024.
    [14] K. Shivashankar, G. S. A. Hajj, and A. Martini, "Scalability and Maintainability Challenges and Solutions in Machine Learning: Systematic Literature Review," arXiv preprint arXiv:2504.11079, 2025.
    [15] P. L. Foalem, F. Khomh, and H. Li, "Studying logging practice in machine learning-based applications," Information and Software Technology, vol. 170, p. 107450, 2024.
    [16] Microsoft. "MLOps maturity model for production ML." Microsoft Azure Architecture Center. https://learn.microsoft.com/en-us/azure/architecture/example-scenario/mlops/mlops-maturity-model (accessed 2025).
    [17] kubernetes. "Kubernetes Documentation." [online], The Linux Foundation. Available:  https://kubernetes.io/docs/concepts/ (accessed 2025).
    [18] Apache Software Foundation. ""[online], Airflow. Available: https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/index.html (accessed 2025).
    [19] Apache Software Foundation. "DOCUMENTATION."[online], Kafka. Available: https://kafka.apache.org/documentation/#gettingStarted (accessed 2025).
    [20] Elastic. "The Elastic Stack." [online], Elastic. Available: https://www.elastic.co/docs/get-started/the-stack (accessed 2025).
    [21] MLflow. " MLflow Document. " [online], MLflow Project. Available: https://www.mlflow.org/docs/1.29.0/index.html (accessed 2025).
    [22] MinIO. "MinIO Documentation." [online], MinIO. Available: https://min.io/docs/minio/kubernetes/upstream/index.html (accessed 2025).
    [23] Harbor Authors 2025. "Harbor 2.4 Documentation." [online], The Linux Foundation. Available: https://goharbor.io/docs/2.4.0/ (accessed 2025).
    [24] GitHub, "GitHub Docs." [online], GitHub. Available: https://docs.github.com/en (accessed 2025).
    [25] GitLab, " Docs." [online], GitLab. Available:https://docs.gitlab.com/ (accessed 2025).
    [26] H. Safri, G. Papadimitriou, and E. Deelman, "Dynamic Tracking, MLOps, and Workflow Integration: Enabling Transparent Reproducibility in Machine Learning," in 2024 IEEE 20th International Conference on e-Science (e-Science), 16-20 Sept. 2024 2024, pp. 1-10, doi: 10.1109/e-Science62913.2024.10678658.
    [27] N. Hoffmann and N. E. Pour, "A Low Overhead Approach for Automatically Tracking Provenance in Machine Learning Workflows," in 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), 8-12 July 2024 2024, pp. 567-573, doi: 10.1109/EuroSPW61312.2024.00092.

    QR CODE
    :::