跳到主要內容

簡易檢索 / 詳目顯示

研究生: 李庭瑄
Ting-Hsuan Lee
論文名稱: 基於對抗式攻擊之強健自監督式學習模型
A Self-Supervised Learning Model for Adversarial Robustness
指導教授: 王家慶
Jai-Ching Wang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 57
中文關鍵詞: 深度學習對抗式攻擊影像辨識自監督式學習模型強健性
外文關鍵詞: deep learning, adversarial attack, AI security, image recognition, self-supervised learning
相關次數: 點閱:15下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著深度學習技術的發展,市面上出現越來越多的深度學習系統融入於我們 的生活中,例如:自駕車系統、人臉辨識系統等等。然而,我們往往忽略了深度 學習系統若出現錯誤決策,可能會導致嚴重的人身事故和財產損失問題。實際上, 現今有許多深度模型可能被惡意攻擊,而導致做出錯誤的決策,例如:在輸入的 資料中插入對抗性擾動以影響深度學習系統的判斷能力,導致模型做出錯誤的決 策。這也證實了深度神經網絡的不安全性。這樣的問題為下游任務帶來了相應的 風險,例如,汽車自動駕駛系統中的速限偵測系統可能會受到對抗性攻擊,使汽 車錯誤辨識導致行駛在高速公路上突然停止或降速等其他非預期之行為,相應地 增加了交通風險。
    為了抵擋對抗性攻擊,目前普遍的方法為對抗式訓練,即將對抗性攻擊產生 的對抗樣本也作為訓練資料讓模型進行學習。雖然經過訓練後,模型可以有效的 防禦對抗樣本,但也影響了對普通樣本的分類能力,進而降低模型的泛化性。於 是,我們提出了使用自監督式學習方式,在不提供正確的標記下,模型自行學習 對抗樣本與原始資料的差異。透過這樣的學習方式來增強模型的強健性,利用少 量標記資料訓練的同時,加強模型對於攻擊樣本的防禦能力。


    With the rapid development of deep learning, a growing number of deep
    learning systems are associated with our daily life, such as auto-driving system, face recognition system, ..., etc. However, we often ignore that the deep- learning system may make a wrong prediction caused by the attacker, and lead the serious personal accidents and property damage. For example, the attacker
    may feed the adversarial example into the deep-learning system and lead the model make a wrong decision. This fact also verified the unreliability of deep learning model and increase the potential risk of the downstream task. For example, the speed violation detection sub-system may be subject to adversarial attacks, causing the auto-driving system to take the unexpected
    behavior and increasing the corresponding risk.
    In order to defend the system against the adversarial attacks, the common
    method is the adversarial training, which allow the model trained on the adversarial examples generated by the adversarial attacks. Although the model is capable to defend the adversarial attack in some degree, it also decreases the performance of the corresponding task and reduces the generalization of the model. Therefore, we propose the framework to train the model in self- supervised learning, which learns to distinguish the adversarial example from
    the original data by without providing the correct label. The proposed framework enhances the robustness as well as the generalization of the trained model to
    against the adversarial attack.

    第一章 緒論 .................................................................................................................1 1-1 研究背景...................................................................................................... 1 1-2 研究目的...................................................................................................... 3 1-3 論文架構...................................................................................................... 4 第二章 文獻探討 .........................................................................................................5 2-1 對抗式攻擊介紹........................................................................................... 5 2-1-1 白盒攻擊(White-box Attack) ......................................................................6 2-1-2 黑盒攻擊(Black-box Attack).......................................................................7 2-2 對抗式防禦 .................................................................................................10 2-2-1 輸入影像降噪(Input Denoising)...............................................................10 2-2-2 模型強健性(Model Robustness) .............................................................11 2-2-3 對抗樣本偵測(Adversarial Detection) ....................................................12 2-3 自監督式網路介紹......................................................................................13 2-3-1 BYOL 架構介紹 ..........................................................................................15 第三章 研究內容與方法介紹 ...................................................................................17 v 3-1 研究方法介紹..............................................................................................17 3-2 方法架構.....................................................................................................18 3-2-1 對抗式 BYOL 預訓練 ................................................................................19 3-2-2 模型線性評估.............................................................................................22 第四章 實驗結果與比較 ...........................................................................................23 4-1 資料集介紹 .................................................................................................23 4-1-1 資料集規格.................................................................................................23 4-1-2 對抗樣本生成.............................................................................................23 4-2 模型訓練.....................................................................................................24 4-2-1 Baseline 模型介紹......................................................................................24 4-2-2 實驗細節.....................................................................................................24 4-2-3 評估指標.....................................................................................................25 4-3 實驗結果.....................................................................................................27 4-3-1 帶有權重再訓練(Retraining)的對抗式 BYOL.........................................27 4-3-2 從頭訓練的多視角對抗式 BYOL.............................................................38 第五章 結論與未來展望...........................................................................................42 5-1 貢獻與總結.................................................................................................42 vi 5-2 未來展望.....................................................................................................43 第六章 參考文獻.......................................................................................................44

    1. Wiyatno, Rey Reza. "Tricking a Machine into Thinking You’re Milla Jovovich." 2018. Web. Available from: https://medium.com/element-ai- research-lab/tricking-a-machine-into-thinking-youre-milla-jovovich- b19bf322d55c.
    2. Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and Harnessing Adversarial Examples." (2014): arXiv:1412.6572. Web. December 01, 2014.
    3. “Grad-cam: Visual explanations from deep networks via gradient-based localization.” Proceedings of the IEEE International Conference on Computer Vision. 2017. Print.
    4. Simonyan, Karen, Andrea Vedaldi, and Andrew Zisserman. "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps." (2013): arXiv:1312.6034. Web. December 01, 2013.
    5. Smilkov, Daniel, et al. "SmoothGrad: removing noise by adding noise." (2017): arXiv:1706.03825. Web. June 01, 2017.
    6. LeCun, Yann, et al. "Object recognition with gradient-based learning." Shape, Contour and Grouping in Computer Vision. Springer, 1999. 319- 45. Print.
    7. Szegedy, Christian, et al. "Intriguing properties of neural networks." (2013): arXiv:1312.6199. Web. December 01, 2013.
    8. Lei, Qi, et al. "Discrete adversarial attacks and submodular optimization with applications to text classification." Proceedings of Machine Learning and Systems 1 (2019): 146-65. Print.
    9. “Hidden voice commands.” 25th USENIX security symposium (USENIX security 16). 2016. Print.
    10. Tramèr, Florian, et al. "The Space of Transferable Adversarial Examples." (2017): arXiv:1704.03453. Web. April 01, 2017.
    11. “Towards evaluating the robustness of neural networks.” 2017 IEEE Symposium on Security and Privacy (sp). 2017. IEEE. Print.
    12. Shaham, Uri, Yutaro Yamada, and Sahand Negahban. "Understanding adversarial training: Increasing local stability of supervised models through robust optimization." Neurocomputing 307 (2018): 195-204. Print.
    13. “Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks.” 2016 IEEE Symposium on Security and Privacy (SP). 22-26 May 2016 2016. Print.
    14. “Tbt: Targeted neural network attack with bit trojan.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. Print.
    15. Liu, Ninghao, et al. "Adversarial attacks and defenses: An interpretation perspective." ACM SIGKDD Explorations Newsletter 23.1 (2021): 86-99. Print.
    16. Kurakin, Alexey, Ian Goodfellow, and Samy Bengio. "Adversarial examples in the physical world." (2016): arXiv:1607.02533. Web. July 01, 2016.
    17. Mądry, Aleksander, et al. "Towards deep learning models resistant to adversarial attacks." STAT 1050 (2017): 9. Print.
    18. “Boosting adversarial attacks with momentum.” Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018. Print.
    19. “Deepfool: a simple and accurate method to fool deep neural networks.” Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2016. Print.
    20. “Improving transferability of adversarial examples with input diversity.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. Print.
    21. Chen, Pin-Yu, et al. "ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models." Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. Association for Computing Machinery, 2017. 15–26. Print.
    22. “Black-box adversarial attacks with limited queries and information.” International Conference on Machine Learning. 2018. PMLR. Print.
    23. Uesato, Jonathan, et al. "Adversarial Risk and the Dangers of Evaluating Against Weak Attacks." Proceedings of the 35th International Conference on Machine Learning. Ed. Jennifer, Dy and Krause Andreass.: PMLR, 2018. Print.
    24. “Nattack: Learning the distributions of adversarial examples for an improved black-box attack on deep neural networks.” International Conference on Machine Learning. 2019. PMLR. Print.
    25. “Defense against adversarial attacks using high-level representation guided denoiser.” Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018. Print.
    26. “Feature denoising for improving adversarial robustness.” Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. 2019. Print.
    27. Xie, Cihang, et al. "Mitigating adversarial effects through randomization." (2017). Print.
    28. Xu, Weilin, David Evans, and Yanjun Qi. "Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks." (2017): arXiv:1704.01155. Web. April 01, 2017.
    29. “Theoretically principled trade-off between robustness and accuracy.” International Conference on Machine Learning. 2019. PMLR. Print.
    30. Tan, Mingxing, et al. "Smooth Adversarial Training." (2020). Print.
    31. Xiao, Chang, Peilin Zhong, and Changxi Zheng. "Enhancing Adversarial Defense by k-Winners-Take-All." (2019): arXiv:1905.10510. Web. May 01, 2019.
    32. Gong, Zhitao, Wenlu Wang, and Wei-Shinn Ku. "Adversarial and Clean Data Are Not Twins." (2017): arXiv:1704.04960. Web. April 01, 2017.
    33. “Magnet: a two-pronged defense against adversarial examples.” Proceedings of the 2017 ACM SIGSAC conference on Computer and Communications Security. 2017. Print.
    34. “Unsupervised feature learning via non-parametric instance discrimination.” Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018. Print.
    35. Grill, Jean-Bastien, et al. "Bootstrap your own latent-a new approach to self-supervised learning." Advances in Neural Information Processing Systems 33 (2020): 21271-84. Print.
    36. “A simple framework for contrastive learning of visual representations.” International Conference on Machine Learning. 2020. PMLR. Print.
    37. “ImageNet: A large-scale hierarchical image database.” 2009 IEEE Conference on Computer Vision and Pattern Recognition. 20-25 June 2009 2009. Print.
    38. “Benchmarking Adversarial Robustness on Image Classification.” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13-19 June 2020 2020. Print.

    QR CODE
    :::