跳到主要內容

簡易檢索 / 詳目顯示

研究生: 張育珉
Yu-Min Zhang
論文名稱: CLUE-NAS: 用於神經架構搜尋的對比式編碼器
CLUE-NAS: A CLIP-Inspired Contrastive Learnable Unifying Encoder for Neural Architecture Search
指導教授: 范國清
Kuo-Chin Fan
口試委員:
學位類別: 博士
Doctor
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 58
中文關鍵詞: Encoder-based NASUniversal NASLanguage Model based NAS
外文關鍵詞: Encoder-based NAS, Universal NAS, Language Model based NAS
相關次數: 點閱:13下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,已經有許多基於編碼器的Neural Architecture Search的方法,這些方法通常先把候選架構編碼成圖(Graph),圖包含了架構中從輸入到輸出的每一層擷取特徵的操作,例如卷積層、池化層等等。這類基於圖的方法著重在捕捉候選架構的拓樸特徵,圖裡頭的每個節點表示一項操作,點之間的邊則表示特徵的傳遞流向,這種表示法直覺且合理,但這種表示法缺乏候選架構的高層次語義特徵,從而限制了編碼器方法的穩健性與泛化能力。上述的這個議題可以從若干現象觀察得出,例如這類NAS方法無法有效詮釋從未見過的操作,例如較新的self-attention,也難以從多搜尋空間的聯合訓練中獲益。為了克服上述限制,我們提出 CLUE-NAS(Contrastive Learnable Unifying Encoder for NAS),這是一個新穎的框架,結合了對比式的預訓練語言模型(Contrastive Language Image Pre-training, CLIP)中的文字編碼器,賴以生成富含語意的上下文嵌入,並透過對比學習將其與圖整合。CLUE-NAS還模仿了人類專家的行為,採用由粗至細(coarse-to-fine)的策略以強化預測效能。實驗結果顯示,在 NASBench-101、NASBench-201 及 NASBench-301等三個NAS領域的重要資料集上,CLUE-NAS不僅展現出對未見操作的高度泛化能力,亦能在多搜尋空間的聯合訓練中顯著受益,整體效能比肩甚至優於多項先進NAS方法。


    Conventional encoder-based neural architecture search (NAS) methods typically encode candidate architectures as graphs based on their information flow and operations. Such graph-based embeddings primarily capture topological features, such as nodes and edges, while lacking high-level semantic representations, which limits the robustness and generalization of encoder-based NAS. This issue is evident in several phenomena, such as the inability of typical NAS methods to interpret previously unseen operations or their limited capacity to benefit from joint training across multiple search spaces. To mitigate these limitations, we propose Contrastive Learnable Unifying Encoder for NAS (CLUE-NAS), a novel framework that leverages the text encoder of Contrastive Language Image Pre-training (CLIP) to generate context embeddings enriched with high-level semantics and integrates them with graph-based embeddings through contrastive learning. CLUE-NAS further emulates human expert behaviors by employing a coarse-to-fine strategy to enhance performance. Experiments on NASBench-101, NASBench-201, and NASBench-301 show that CLUE-NAS not only demonstrates strong generalization to unseen operations but also benefits substantially from joint training, achieving competitive results against state-of-the-art NAS baselines.

    Chinese Abstract...i English Abstract...ii Contents... iii List of Figures... vi List of Tables...ix 1. Introduction...1 2. Related Work...4 2.1 Various Types of NAS...4 2.2 Low-Cost NAS...4 2.3 Encoder-based NAS...5 2.4 Contrastive LanguageImage Pre-training (CLIP)...5 3. Preliminary...6 3.1 Cell-based Search Space...6 3.2 Two Common Graph-based Encoders in NAS...7 3.3 Limitations of Graph Format...8 3.4 Discussion of Graph Format for NAS...11 4. Combination of CLIP with MLP...13 4.1 MLP as the Architecture Encoder...13 4.1.1 CLIP to Add Context Features...14 4.2 Discussion of MLP w/ CLIP...15 5.Contrastive Learnable Unifying Encoder...17 5.1 Graph-based Embedding...17 5.1.1 Flattened Adjacency Matrix...17 5.1.2 Operation Metrics...18 5.2 Align the Context and Graph-based Embeddings...19 5.2.1 Positional Embedding for Confidence Score Division...19 5.2.2 Context Alignment:...20 5.2.3 Confidence Prediction:...20 5.3 CLUE-NAS Training and Evaluation...21 5.3.1 Architecture-Accuracy Pairs...21 5.3.2 Training Stage...21 5.3.3 Evaluation Stage...22 6. Implementation Detail of Metric Embedding...23 6.1 FLOPs, Parameters, and Latency...23 6.2 Passing Rate...23 6.3 Numerical Sensitivity...24 7. Experimental Results...25 7.1 Comparison of CLUE-NAS and Encoders...25 7.1.1 Independent Training Experiments...25 7.1.2 Joint Training Experiments...26 7.1.3 Unseen Architecture Experiments...27 7.2 Ablation study of CLUE-NAS...28 7.2.1 Impact of Key Components...29 7.2.2 Impact of Context Embedding Length...29 7.2.3 CLUE-NAS with Sampling Strategies...30 7.2.4 Impact of Different Context Prompts...31 7.2.5 Different Lower Bounds for Confidence Prediction...32 7.3 Comparison Finetune-free CLUE-NAS and Other NASs...33 7.3.1 Encoder-based NAS...33 7.3.2 LLM-based NAS...34 7.3.3 Low-cost NAS...34 7.4 Visualizations of Encoders...34 7.4.1 Numerical Perturbation...35 8. Conclusions...38 References...39

    [1] B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” arXiv preprint arXiv:1611.01578, 2016.
    [2] B. Baker, O. Gupta, N. Naik, and R. Raskar, “Designing neural network architectures using reinforcement learning,” arXiv preprint arXiv:1611.02167, 2016.
    [3] M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le, “MnasNet: Platform-aware neural architecture search for mobile,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2820–2828.
    [4] X. Dong and Y. Yang, “Searching for a robust neural architecture in four GPU hours,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1761–1770.
    [5] H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture search,” arXiv preprint arXiv:1806.09055, 2018.
    [6] H. Cai, L. Zhu, and S. Han, “ProxylessNAS: Direct neural architecture search on target task and hardware,” arXiv preprint arXiv:1812.00332, 2018.
    [7] S. Xie, H. Zheng, C. Liu, and L. Lin, “SNAS: stochastic neural architecture search,” arXiv preprint arXiv:1812.09926, 2018.
    [8] R. Ru, C. Lyle, L. Schut, M. Fil, M. van der Wilk, and Y. Gal, “Speedy performance estimation for neural architecture search,” Advances in Neural Information Processing Systems, vol. 34, pp. 4079–4092, 2021.
    [9] M. S. Abdelfattah, A. Mehrotra, Ł. Dudziak, and N. D. Lane, “Zero-cost proxies for lightweight NAS,” arXiv preprint arXiv:2101.08134, 2021.
    [10] M. Lin, P. Wang, Z. Sun, H. Chen, X. Sun, Q. Qian, H. Li, and R. Jin, “Zen-NAS: A zero-shot NAS for high-performance image recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 347–356.
    [11] G. Li, Y. Yang, K. Bhardwaj, and R. Marculescu, “ZiCo: Zero-shot NAS via inverse coefficient of variation on gradients,” arXiv preprint arXiv:2301.11300, 2023.
    [12] W. Chen, X. Gong, and Z. Wang, “Neural architecture search on ImageNet in four GPU hours: A theoretically inspired perspective,” arXiv preprint arXiv:2102.11535, 2021.
    [13] J. Xu, L. Zhao, J. Lin, R. Gao, X. Sun, and H. Yang, “KNAS: green neural architecture search,” in International Conference on Machine Learning. PMLR, 2021, pp. 11613–11625.
    [14] Z. Zhu, F. Liu, G. Chrysos, and V. Cevher, “Generalization properties of NAS under activation and skip connection search,” Advances in Neural Information Processing Systems, vol. 35, pp. 23551–23565, 2022.
    [15] J. Mok, B. Na, J.-H. Kim, D. Han, and S. Yoon, “Demystifying the neural tangent kernel from a practical perspective: Can it be trusted for neural architecture search without training?” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11861–11870.
    [16] Y. Zhang, J. W. Hsieh, X. Li, M.-C. Chang, C.-C. Lee, and K.-C. Fan, “Mote-nas: Multi-objective training-based estimate for efficient neural architecture search,” in The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024.
    [17] C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy, “Progressive neural architecture search,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 19–34.
    [18] R. Luo, F. Tian, T. Qin, E. Chen, and T.-Y. Liu, “Neural architecture optimization,” Advances in neural information processing systems, vol. 31, 2018.
    [19] W. Wen, H. Liu, Y. Chen, H. Li, G. Bender, and P.-J. Kindermans, “Neural predictor for neural architecture search,” in European Conference on Computer Vision. Springer, 2020, pp. 660–676.
    [20] L. Dudziak, T. Chau, M. Abdelfattah, R. Lee, H. Kim, and N. Lane, “BRP-NAS: Prediction-based nas using GCNs,” Advances in Neural Information Processing Systems, vol. 33, pp. 10480–10490, 2020.
    [21] C. Wei, C. Niu, Y. Tang, Y. Wang, H. Hu, and J. Liang, “NPENAS: Neural predictor guided evolution for neural architecture search,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
    [22] J. Wu, X. Dai, D. Chen, Y. Chen, M. Liu, Y. Yu, Z. Wang, Z. Liu, M. Chen, and L. Yuan, “Stronger NAS with weaker predictors,” Advances in Neural Information Processing Systems, vol. 34, pp. 28904–28918, 2021.
    [23] M. Huang, Z. Huang, C. Li, X. Chen, H. Xu, Z. Li, and X. Liang, “Arch-Graph: Acyclic architecture relation predictor for task-transferable neural architecture search,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11881–11891.
    [24] Y.-M. Zhang, J.-W. Hsieh, C.-C. Lee, and K.-C. Fan, “Rats-nas: Redirection of adjacent trails on graph convolutional networks for predictor-based neural architecture search,” IEEE Transactions on Artificial Intelligence, 2024.
    [25] C. Ying, A. Klein, E. Christiansen, E. Real, K. Murphy, and F. Hutter, “NAS-Bench-101: Towards reproducible neural architecture search,” in International Conference on Machine Learning. PMLR, 2019, pp. 7105–7114.
    [26] X. Dong and Y. Yang, “NAS-Bench-201: Extending the scope of reproducible neural architecture search,” arXiv preprint arXiv:2001.00326, 2020.
    [27] J. N. Siems, L. Zimmer, A. Zela, J. Lukasik, M. Keuper, and F. Hutter, “NAS-Bench-301 and the case for surrogate benchmarks for neural architecture search,” 2020.
    [28] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
    [30] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
    [31] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.
    [32] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
    [33] Z. Lu, I. Whalen, V. Boddeti, Y. Dhebar, K. Deb, E. Goodman, and W. Banzhaf, “NSGA-Net: neural architecture search using multi-objective genetic algorithm,” in Proceedings of the genetic and evolutionary computation conference, 2019, pp. 419–427.
    [34] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image classifier architecture search,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 4780–4789.
    [35] E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin, “Large-scale evolution of image classifiers,” in International Conference on Machine Learning. PMLR, 2017, pp. 2902–2911.
    [36] L. Xie and A. Yuille, “Genetic CNN,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1379–1388.
    [37] X. Dai, A. Wan, P. Zhang, B. Wu, Z. He, Z. Wei, K. Chen, Y. Tian, M. Yu, P. Vajda et al., “FBNetV3: Joint architecture-recipe search using predictor pretraining,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16276–16285.
    [38] J. Mellor, J. Turner, A. Storkey, and E. J. Crowley, “Neural architecture search without training,” in International Conference on Machine Learning. PMLR, 2021, pp. 7588–7598.
    [39] N. Lee, T. Ajanthan, and P. H. Torr, “SNIP: Single-shot network pruning based on connection sensitivity,” arXiv preprint arXiv:1810.02340, 2018.
    [40] J. Mellor, J. Turner, A. Storkey, and E. J. Crowley, “Neural architecture search without training,” in International Conference on Machine Learning. PMLR, 2021, pp. 7588–7598.
    [41] H. Tanaka, D. Kunin, D. L. Yamins, and S. Ganguli, “Pruning neural networks without any data by iteratively conserving synaptic flow,” Advances in Neural Information Processing Systems, vol. 33, pp. 6377–6389, 2020.
    [42] Y. Shen, Y. Li, J. Zheng, W. Zhang, P. Yao, J. Li, S. Yang, J. Liu, and B. Cui, “ProxyBO: Accelerating neural architecture search via Bayesian optimization with zero-cost proxies,” arXiv preprint arXiv:2110.10423, 2021.
    [43] J. Lee, L. Xiao, S. Schoenholz, Y. Bahri, R. Novak, J. Sohl-Dickstein, and J. Pennington, “Wide neural networks of any depth evolve as linear models under gradient descent,” Advances in neural information processing systems, vol. 32, 2019.
    [44] X. Ning, Y. Zheng, T. Zhao, Y. Wang, and H. Yang, “A generic graph-based neural architecture encoding scheme for predictor-based nas,” in European Conference on Computer Vision. Springer, 2020, pp. 189–204.
    [45] X. Ning, Z. Zhou, J. Zhao, T. Zhao, Y. Deng, C. Tang, S. Liang, H. Yang, and Y. Wang, “Ta-gates: An encoding scheme for neural network architectures,” Advances in Neural Information Processing Systems, vol. 35, pp. 32325–32339, 2022.
    [46] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
    [47] X. Deng, H. Shi, R. Huang, C. Li, H. Xu, J. Han, J. Kwok, S. Zhao, W. Zhang, and X. Liang, “Growclip: Data-aware automatic model growing for large-scale contrastive language-image pre-training,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 22178–22189.
    [48] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708.
    [49] J. H. Zar, “Spearman rank correlation,” Encyclopedia of biostatistics, vol. 7, 2005.
    [50] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
    [51] P. Chrabaszcz, I. Loshchilov, and F. Hutter, “A downsampled variant of ImageNet as an alternative to the CIFAR datasets,” arXiv preprint arXiv:1707.08819, 2017.
    [52] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
    [53] M. Zheng, X. Su, S. You, F. Wang, C. Qian, C. Xu, and S. Albanie, “Can gpt-4 perform neural architecture search?” arXiv preprint arXiv:2304.10970, 2023.
    [54] M. U. Nasir, S. Earle, J. Togelius, S. James, and C. Cleghorn, “Llmatic: neural architecture search via large language models and quality diversity optimization,” in Proceedings of the Genetic and Evolutionary Computation Conference, 2024, pp. 1110–1118.

    QR CODE
    :::