深度學習基礎模型與自監督學習｜國立中央大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳文研 Tran Van Nhiem
論文名稱：	深度學習基礎模型與自監督學習 Deep Learning Foundation Model with Self-Supervised Learning
指導教授：	王家慶 Jia-Ching Wang 栗永徽 Yung-Hui Li
口試委員:
學位類別：	博士 Doctor
系所名稱：	資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering
論文出版年：	2024
畢業學年度：	112
語文別：	英文
論文頁數：	131
中文關鍵詞：	自監督學習、計算機視覺、視覺表徵學習、深度神經網絡、圖像分析、特徵學習
外文關鍵詞：	Self-Supervised Learning, Deep Learning Foundation Model, Computer Vision Foundation Model, Visual Representation learning, Deep Neural Network, Image Processing
相關次數：	點閱：22 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

最近在自監督式學習的發展讓我發現其取代傳統監督式學習的可能性，尤其是自監督式學習解決了傳統監督式學習的需要大量標記資料及對不同任務泛化性不高的問題。自監督式學習使用容易獲得的未標記數據對深度神經網絡進行預訓練，然後在下游任務上進行微調，相比於監督式學習需要更少的標記資料。值得注意的是，自監督學習在包括文本、視覺、語音等多個領域均展現出成功。
在本簡報中，我們提出了數種新穎的自監督式學習方法，用於視覺表徵學習，可以提高多個計算機視覺下游任務的效果。這些方法目標是利用輸入數據本身來生成學習目標。我們的第一種方法HAPiCLR利用影像的上下文表徵中的像素級信息，並結合對比式學習目標，使其能夠為下游任務學習更有效的圖像表徵。第二種方法HARL引入了一種基於啟發式注意力的方法，最大化向量空間中抽象對象級嵌入，從而產生更高質量的語義表徵。最後，MVMA框架結合了多個資料擴增的輸入，利用每個訓練樣本的全局和局部信息， MVMA框架可以探索廣泛的圖像外觀，這種方法產生的表徵具有對於不同尺度的影像有很高的魯棒性，使其對下游任務有更高的泛化性及提高訓練的效率。
這些方法顯著改善了圖像分類、物件偵測和語義分割等任務的性能。它們展示了自監督式學習提取圖像特徵的能力，從而提高了在各種計算機視覺任務中的深度神經網絡效果及效率。本論文不僅介紹了新的學習算法，還提供了對自監督表徵的全面分析，揭示了不同模型之間的區別因素。總的來說，它展示了一套創新、高效、泛化性高的自監督學習在方法，使自監督式模型更好的泛化到下游任務的能力。

Recent advances in self-supervised learning have shown promise as an alternative to supervised learning, particularly for addressing its critical shortcomings: the need for abundant labeled data and the inability to leverage prior knowledge and skills. Self-supervised learning involves pre-training deep neural networks on pretext tasks using easily acquirable, unlabeled data and then fine-tuning it on downstream tasks of interest, requiring fewer labeled data than supervised learning. Notably, self-supervised learning has demonstrated success in diverse domains, including text, vision, speech, etc.
In this thesis, we present several novel self-supervised learning methods for visual representation learning that can improve the performance of multiple computer vision downstream tasks. These methods are designed to leverage the input data itself for generating learning targets. Our first method, HAPiCLR, leverages pixel-level information from an object's contextual representation with a contrastive learning objective, allowing it to learn more robust and efficient image representations for downstream tasks. The second method, HARL, introduces a heuristic attention-based approach that maximizes the abstract object-level embedding in vector space, resulting in higher quality semantic representations. Finally, the MVMA framework combines multiple augmentation pipelines and leveraging both global and local information from each training sample, the MVMA framework can explore a vast range of image appearances. This approach results in representations that are not only scale-invariant but also invariant to nuisance-factors, making them more robust and efficient for downstream tasks.
These methods have notably improved performance in tasks like image classification, object detection, and semantic segmentation. They demonstrate the ability of self-supervised algorithms to transform high-level image properties, thereby enhancing deep neural network efficiency in various computer vision tasks. This thesis not only introduces new learning algorithms but also provides a comprehensive analysis of self-supervised representations and the distinct factors that differentiate various models. Overall, it presents a suite of innovative, adaptable, and efficient approaches to self-supervised learning in image representation, significantly boosting the robustness and effectiveness of learned features.

List of Contents 
List of Figures    IX
List of Tables    XII
List of Abbreviations    XV
Chapter I. Introduction    1
1-1.    Introduction    1
1-2.    Thesis Contributions    6
1-3.    Chapter Guide    7
Chapter II. Self-Supervised Learning History Development and Current State    10
2-1.  Representation Learning.    10
2-1-1.    Foundation Model Representation Learning via Supervised Learning    10
2-1-2.    Foundation Model Representation Learning via Self-supervised    11
2-2.    History and evolution of self-supervised learning.    13
2-3.    Main Categories of Self-supervised Learning    16
2-3-1.    Contrastive learning methods    16
2-3-2.    Predictive learning Distillation-based methods    17
2-3-3.    Redundancy reduction methods    17
2-3-4.    Reconstruction Self-supervised methods    18
2-3-5.    Generative SSL methods    18
2-4.    Research Gaps and Limitations    20
Chapter III. Self-supervised Contrastive Learning on Pixel-Level    21
3-1.    Introduction    21
3-2.    Related Work    22
3-3.    Methodology    23
3-4.    Implementation Detail    27
3-4-1.    Dataset and image augmentation.    27
3-4-2.    Neural Network Architecture.    28
3-4-3.    Optimization Objective.    28
3-5.    Evaluation Protocol    28
3-5-1.    Performance with Linear Evaluation and Semi-supervised Learning on ImageNet Dataset.    28
3-5-2.    Transfer Learning to Other Downstream Tasks.    29
3-6.    Ablation and Analysis    30
3-6-1.    Mask Cropping Strategies.    31
3-6-2.    Objective Loss Functions.    32
3-6-3.    Batch Size.    33
3-6-4.    Projection Head    34
3-7.    Chapter Summary    35
3-8. Supplement Section    35
3-8-A. Implementation Details    35
3-8-A-1. Heuristic Mask Proposal Generator    35
3-8-A-2. Implementation: Data Augmentation    36
3-8-B. Evaluation on ImageNet and Transfer Learning    37
3-8-B-1. Linear evaluation semi-supervised protocol on ImageNet.    37
3-8-B-2. Transfer Learning    38
Chapter IV. Heuristic Attention Representation Learning for Predictive Learning Self-Supervised Pretraining    41
4-1.    Introduction    41
4-2.    Related Work    43
4-3.    Methods    44
4-3-1.    HARL Framework    44
4-3-2.    Heuristic Binary Mask    47
4-4.    Experiments    48
4-5.    Evaluation Protocol    49
4-5-1.    Linear Evaluation and Semi-Supervised Learning on ImageNet Dataset    49
4-5-2.    Transfer Learning to Other Downstream Tasks.    50
4-6.    Ablation and Analysis    51
4-6-1.    The Output of Spatial Feature Map (Size and Dimension)    52
4-6-2.    Objective Loss Functions    53
4-6-2-1.    Mask loss    54
4-6-2-2.    Hybrid loss    54
4-6-2-3.    Mask loss versus hybrid loss    55
4-6-3.    The Impact of Heuristic Mask Quality    55
4-7.    Conclusion    58
4-8.    Supplement Implementation Detail    59
4-8-1.    Implementation Data Augmentation    59
4-8-2.    Implementation Masking Feature    60
4-8-3.    Evaluation on the ImageNet and Transfer Learning    61
4-8-3-1.    Linear evaluation semi-supervised protocol on ImageNet    61
4-8-3-2.    Transfer via linear classification and fine-tuning    62
4-8-3-3.    Transfer learning to other vision tasks    62
4-8-4.    Heuristic Mask Proposal Methods    63
4-8-4-1.    Heuristic binary mask generates using DRFI    63
4-8-4-2.    Heuristic binary mask generates using unsupervised deep learning    63
Chapter V. Multi-View and Multi-Augmentation for Self-Supervised Visual Representation Learning    66
5-1.    Introduction    66
5-2.    Related Work    67
5-2-1.    Self-Supervised Learning    67
5-2-2.    Cropping Strategy    68
5-2-3.    Multi-Cropping    69
5-2-4.    Data Augmentation Searching    70
5-3.    Methodology    71
5-3-1.    Multi-Cropping    72
5-3-2.    Multi-Data Augmentation    72
5-3-3.    Loss Function    76
5-4.    Experiments    79
5-4-1.    SSL Pre-training Setup    79
5-4-2.    Evaluation Protocol and Main Results    81
5-4-2-1.    Evaluation on ImageNet    81
5-4-2-2.    Evaluation on multiple natural image classification tasks    82
5-4-2-3.    Evaluation on downstream task transfer    82
5-4-2-4.    Discovering semantic scene layouts by observing the self-attention map    84
5-5.    Ablation Study    86
5-5-1.    Global and Local View Crop Ratio and Resolution    86
5-5-1.    Number of Cropped Views    86
5-5-2.    Number of Augmentation Strategies    88
5-5-3.    Global- and Local-View Loss    89
5-6.    Supplement Implementation Detail    90
5-6-1.    Implement of MVMA multi-data augmentation    90
5-7.    Conclusion    96
Chapter VI. Conclusion    97
6-1.    Summary    97
6-2.    Discussion    98
6-2-1.    Implications and Applications of Self-supervised Learning    98
6-2-2.    Limitations    99
6-3.    Future Direction    100
6-3-1.    Improving the Quality of Representation    100
6-3-2.    Building Self-Supervised Multi-Modal Models    101
6-3-3.    Exploring New Self-Supervised Application Domain    101
Bibliography    103


                                

1 Tan, M., and Le, Q.V.: ‘EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks’, ArXiv, 2019, abs/1905.11946
2 He, K., Zhang, X., Ren, S., and Sun, J.: ‘Deep Residual Learning for Image Recognition’, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778
3 Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I.: ‘Attention is All you Need’, ArXiv, 2017, abs/1706.03762
4 Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I.: ‘Robust Speech Recognition via Large-Scale Weak Supervision’, ArXiv, 2022, abs/2212.04356
5 Abdel-Hamid, O., Mohamed, A.-r., Jiang, H., Deng, L., Penn, G., and Yu, D.: ‘Convolutional Neural Networks for Speech Recognition’, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22, pp. 1533-1545
6 Sun, C., Shrivastava, A., Singh, S., and Gupta, A.K.: ‘Revisiting Unreasonable Effectiveness of Data in Deep Learning Era’, 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 843-852
7 Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., and Houlsby, N.: ‘Big Transfer (BiT): General Visual Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Big Transfer (BiT): General Visual Representation Learning’ (2019, edn.), pp.
8 LeCun, Y., Bengio, Y., and Hinton, G.: ‘Deep Learning’, Nature, 2015, 521, pp. 436-444
9 Eslami, S.M.A., Jimenez Rezende, D., Besse, F., Viola, F., Morcos, A.S., Garnelo, M., Ruderman, A., Rusu, A.A., Danihelka, I., Gregor, K., Reichert, D.P., Buesing, L., Weber, T., Vinyals, O., Rosenbaum, D., Rabinowitz, N.C., King, H., Hillier, C., Botvinick, M.M., Wierstra, D., Kavukcuoglu, K., and Hassabis, D.: ‘Neural scene representation and rendering’, Science, 2018, 360, pp. 1204 - 1210
10 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., and Fei-Fei, L.: ‘ImageNet Large Scale Visual Recognition Challenge’, International Journal of Computer Vision, 2015, 115, pp. 211-252
11 Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I.: ‘Learning Transferable Visual Models From Natural Language Supervision’, in Editor (Ed.)^(Eds.): ‘Book Learning Transferable Visual Models From Natural Language Supervision’ (2021, edn.), pp.
12 Misra, Y.L.a.I.: ‘ Self-supervised learning: The dark matter of intelligence.’, in Editor (Ed.)^(Eds.): ‘Book Self-supervised learning: The dark matter of intelligence.’ (2022, edn.), pp.
13 Chen, T., Kornblith, S., Norouzi, M., and Hinton, G.: ‘A simple framework for contrastive learning of visual representations’, in Editor (Ed.)^(Eds.): ‘Book A simple framework for contrastive learning of visual representations’ (PMLR, 2020, edn.), pp. 1597-1607
14 Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., and Gheshlaghi Azar, M.: ‘Bootstrap your own latent-a new approach to self-supervised learning’, Advances in neural information processing systems, 2020, 33, pp. 21271-21284
15 Goyal, P., Caron, M., Lefaudeux, B., Xu, M., Wang, P., Pai, V., Singh, M., Liptchinsky, V., Misra, I., Joulin, A., and Bojanowski, P.: ‘Self-supervised Pretraining of Visual Features in the Wild’, ArXiv, 2021, abs/2103.01988
16 Caron, M., Touvron, H., Misra, I., J'egou, H.e., Mairal, J., Bojanowski, P., and Joulin, A.: ‘Emerging Properties in Self-Supervised Vision Transformers’, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 9630-9640
17 Xu, D., Xiao, J., Zhao, Z., Shao, J., Xie, D., and Zhuang, Y.: ‘Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 10326-10335
18 Alwassel, H., Mahajan, D.K., Torresani, L., Ghanem, B., and Tran, D.: ‘Self-Supervised Learning by Cross-Modal Audio-Video Clustering’, ArXiv, 2019, abs/1911.12667
19 Baevski, A., Zhou, H., Mohamed, A.-r., and Auli, M.: ‘wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations’, ArXiv, 2020, abs/2006.11477
20 Gong, Y., Lai, C.-I., Chung, Y.-A., and Glass, J.R.: ‘SSAST: Self-Supervised Audio Spectrogram Transformer’, in Editor (Ed.)^(Eds.): ‘Book SSAST: Self-Supervised Audio Spectrogram Transformer’ (2021, edn.), pp.
21 Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.: ‘BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding’, ArXiv, 2019, abs/1810.04805
22 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V.: ‘RoBERTa: A Robustly Optimized BERT Pretraining Approach’, ArXiv, 2019, abs/1907.11692
23 Xie, Y., Xu, Z., Wang, Z., and Ji, S.: ‘Self-Supervised Learning of Graph Neural Networks: A Unified Review’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 45, pp. 2412-2429
24 Goyal, P., Mahajan, D.K., Gupta, A.K., and Misra, I.: ‘Scaling and Benchmarking Self-Supervised Visual Representation Learning’, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6390-6399
25 Goyal, P., Duval, Q., Seessel, I., Caron, M., Misra, I., Sagun, L., Joulin, A., and Bojanowski, P.: ‘Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision’, ArXiv, 2022, abs/2202.08360
26 Bengio, Y., Courville, A.C., and Vincent, P.: ‘Representation Learning: A Review and New Perspectives’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35, pp. 1798-1828
27 Bottou, L.: ‘Large-Scale Machine Learning with Stochastic Gradient Descent’, in Editor (Ed.)^(Eds.): ‘Book Large-Scale Machine Learning with Stochastic Gradient Descent’ (2010, edn.), pp.
28 Rifai, S., Vincent, P., Muller, X., Glorot, X., and Bengio, Y.: ‘Contractive Auto-Encoders: Explicit Invariance During Feature Extraction’, in Editor (Ed.)^(Eds.): ‘Book Contractive Auto-Encoders: Explicit Invariance During Feature Extraction’ (2011, edn.), pp.
29 Goldberg, Y., and Levy, O.: ‘word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method’, ArXiv, 2014, abs/1402.3722
30 Xie, J., Girshick, R.B., and Farhadi, A.: ‘Unsupervised Deep Embedding for Clustering Analysis’, ArXiv, 2015, abs/1511.06335
31 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., and Bengio, Y.: ‘Generative Adversarial Nets’, in Editor (Ed.)^(Eds.): ‘Book Generative Adversarial Nets’ (2014, edn.), pp.
32 Larsson, G., Maire, M., and Shakhnarovich, G.: ‘Colorization as a Proxy Task for Visual Understanding’, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 840-849
33 Noroozi, M., and Favaro, P.: ‘Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles’ (2016, edn.), pp.
34 Gidaris, S., Singh, P., and Komodakis, N.: ‘Unsupervised Representation Learning by Predicting Image Rotations’, ArXiv, 2018, abs/1803.07728
35 Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., and Efros, A.A.: ‘Context Encoders: Feature Learning by Inpainting’, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2536-2544
36 Oord, A.v.d., Li, Y., and Vinyals, O.: ‘Representation Learning with Contrastive Predictive Coding’, ArXiv, 2018, abs/1807.03748
37 He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R.B.: ‘Momentum Contrast for Unsupervised Visual Representation Learning’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9726-9735
38 Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A.: ‘Unsupervised Learning of Visual Features by Contrasting Cluster Assignments’, ArXiv, 2020, abs/2006.09882
39 He, K., Chen, X., Xie, S., Li, Y., Doll'ar, P., and Girshick, R.B.: ‘Masked Autoencoders Are Scalable Vision Learners’, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 15979-15988
40 Bardes, A., Ponce, J., and LeCun, Y.: ‘VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning’, ArXiv, 2021, abs/2105.04906
41 Baevski, A., Hsu, W.-N., Xu, Q., Babu, A., Gu, J., and Auli, M.: ‘data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language’, in Editor (Ed.)^(Eds.): ‘Book data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language’ (2022, edn.), pp.
42 Baevski, A., Babu, A., Hsu, W.-N., and Auli, M.: ‘Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language’, ArXiv, 2022, abs/2212.07525
43 Misra, I., and Maaten, L.v.d.: ‘Self-Supervised Learning of Pretext-Invariant Representations’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6706-6716
44 Caron, M., Bojanowski, P., Joulin, A., and Douze, M.: ‘Deep Clustering for Unsupervised Learning of Visual Features’, in Editor (Ed.)^(Eds.): ‘Book Deep Clustering for Unsupervised Learning of Visual Features’ (2018, edn.), pp.
45 Cuturi, M.: ‘Sinkhorn Distances: Lightspeed Computation of Optimal Transport’, in Editor (Ed.)^(Eds.): ‘Book Sinkhorn Distances: Lightspeed Computation of Optimal Transport’ (2013, edn.), pp.
46 Hinton, G.E., Vinyals, O., and Dean, J.: ‘Distilling the Knowledge in a Neural Network’, ArXiv, 2015, abs/1503.02531
47 Chen, X., and He, K.: ‘Exploring Simple Siamese Representation Learning’, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 15745-15753
48 Gidaris, S., Bursuc, A., Puy, G., Komodakis, N., Cord, M., and Pérez, P.: ‘OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning’, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 6826-6836
49 Ermolov, A., Siarohin, A., Sangineto, E., and Sebe, N.: ‘Whitening for Self-Supervised Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Whitening for Self-Supervised Representation Learning’ (2020, edn.), pp.
50 Zbontar, J., Jing, L., Misra, I., LeCun, Y., and Deny, S.: ‘Barlow Twins: Self-Supervised Learning via Redundancy Reduction’, in Editor (Ed.)^(Eds.): ‘Book Barlow Twins: Self-Supervised Learning via Redundancy Reduction’ (2021, edn.), pp.
51 Radford, A., and Narasimhan, K.: ‘Improving Language Understanding by Generative Pre-Training’, in Editor (Ed.)^(Eds.): ‘Book Improving Language Understanding by Generative Pre-Training’ (2018, edn.), pp.
52 Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I.: ‘Language Models are Unsupervised Multitask Learners’, in Editor (Ed.)^(Eds.): ‘Book Language Models are Unsupervised Multitask Learners’ (2019, edn.), pp.
53 Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T.J., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D.: ‘Language Models are Few-Shot Learners’, ArXiv, 2020, abs/2005.14165
54 Chen, M., Radford, A., Wu, J., Jun, H., Dhariwal, P., Luan, D., and Sutskever, I.: ‘Generative Pretraining From Pixels’, in Editor (Ed.)^(Eds.): ‘Book Generative Pretraining From Pixels’ (2020, edn.), pp.
55 Bao, H., Dong, L., and Wei, F.: ‘BEiT: BERT Pre-Training of Image Transformers’, ArXiv, 2021, abs/2106.08254
56 Kingma, D.P., and Welling, M.: ‘Auto-Encoding Variational Bayes’, CoRR, 2013, abs/1312.6114
57 Sohl-Dickstein, J.N., Weiss, E.A., Maheswaranathan, N., and Ganguli, S.: ‘Deep Unsupervised Learning using Nonequilibrium Thermodynamics’, ArXiv, 2015, abs/1503.03585
58 Ho, J., Jain, A., and Abbeel, P.: ‘Denoising Diffusion Probabilistic Models’, ArXiv, 2020, abs/2006.11239
59 Chen, T., Kornblith, S., Swersky, K., Norouzi, M., and Hinton, G.E.: ‘Big self-supervised models are strong semi-supervised learners’, Advances in neural information processing systems, 2020, 33, pp. 22243-22255
60 Bardes, A., Ponce, J., and LeCun, Y.: ‘Vicreg: Variance-invariance-covariance regularization for self-supervised learning’, arXiv preprint arXiv:2105.04906, 2021
61 Bachman, P., Hjelm, R.D., and Buchwalter, W.: ‘Learning representations by maximizing mutual information across views’, Advances in neural information processing systems, 2019, 32
62 Misra, I., and Maaten, L.v.d.: ‘Self-supervised learning of pretext-invariant representations’, in Editor (Ed.)^(Eds.): ‘Book Self-supervised learning of pretext-invariant representations’ (2020, edn.), pp. 6707-6717
63 He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R.: ‘Momentum contrast for unsupervised visual representation learning’, in Editor (Ed.)^(Eds.): ‘Book Momentum contrast for unsupervised visual representation learning’ (2020, edn.), pp. 9729-9738
64 Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., and Isola, P.: ‘What makes for good views for contrastive learning?’, Advances in Neural Information Processing Systems, 2020, 33, pp. 6827-6839
65 Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A.: ‘Unsupervised learning of visual features by contrasting cluster assignments’, Advances in Neural Information Processing Systems, 2020, 33, pp. 9912-9924
66 Chen, X., and He, K.: ‘Exploring simple siamese representation learning’, in Editor (Ed.)^(Eds.): ‘Book Exploring simple siamese representation learning’ (2021, edn.), pp. 15750-15758
67 Gidaris, S., Bursuc, A., Puy, G., Komodakis, N., Cord, M., and Pérez, P.: ‘Online bag-of-visual-words generation for unsupervised representation learning’, arXiv preprint arXiv:2012.11552, 2020
68 Zbontar, J., Jing, L., Misra, I., LeCun, Y., and Deny, S.: ‘Barlow twins: Self-supervised learning via redundancy reduction’, in Editor (Ed.)^(Eds.): ‘Book Barlow twins: Self-supervised learning via redundancy reduction’ (PMLR, 2021, edn.), pp. 12310-12320
69 Putri, W.R., Liu, S.-H., Aslam, M.S., Li, Y.-H., Chang, C.-C., and Wang, J.-C.: ‘Self-Supervised Learning Framework toward State-of-the-Art Iris Image Segmentation’, Sensors, 2022, 22, (6), pp. 2133
70 Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., and Shah, R.: ‘Signature verification using a" siamese" time delay neural network’, Advances in neural information processing systems, 1993, 6
71 Chopra, S., Hadsell, R., and LeCun, Y.: ‘Learning a similarity metric discriminatively, with application to face verification’, in Editor (Ed.)^(Eds.): ‘Book Learning a similarity metric discriminatively, with application to face verification’ (IEEE, 2005, edn.), pp. 539-546
72 Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., and Bengio, Y.: ‘Learning deep representations by mutual information estimation and maximization’, arXiv preprint arXiv:1808.06670, 2018
73 Xie, Z., Lin, Y., Zhang, Z., Cao, Y., Lin, S., and Hu, H.: ‘Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning’, in Editor (Ed.)^(Eds.): ‘Book Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning’ (2021, edn.), pp. 16684-16693
74 Van Gansbeke, W., Vandenhende, S., Georgoulis, S., and Van Gool, L.: ‘Unsupervised semantic segmentation by contrasting object mask proposals’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised semantic segmentation by contrasting object mask proposals’ (2021, edn.), pp. 10052-10062
75 Wang, X., Zhang, R., Shen, C., Kong, T., and Li, L.: ‘Dense contrastive learning for self-supervised visual pre-training’, in Editor (Ed.)^(Eds.): ‘Book Dense contrastive learning for self-supervised visual pre-training’ (2021, edn.), pp. 3024-3033
76 Iizuka, S., Simo-Serra, E., and Ishikawa, H.: ‘Let there be color! Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification’, ACM Transactions on Graphics (ToG), 2016, 35, (4), pp. 1-11
77 Larsson, G., Maire, M., and Shakhnarovich, G.: ‘Colorization as a proxy task for visual understanding’, in Editor (Ed.)^(Eds.): ‘Book Colorization as a proxy task for visual understanding’ (2017, edn.), pp. 6874-6883
78 Zhang, R., Isola, P., and Efros, A.A.: ‘Colorful image colorization’, in Editor (Ed.)^(Eds.): ‘Book Colorful image colorization’ (Springer, 2016, edn.), pp. 649-666
79 Doersch, C., Gupta, A., and Efros, A.A.: ‘Unsupervised visual representation learning by context prediction’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised visual representation learning by context prediction’ (2015, edn.), pp. 1422-1430
80 Mundhenk, T.N., Ho, D., and Chen, B.Y.: ‘Improvements to context based self-supervised learning’, in Editor (Ed.)^(Eds.): ‘Book Improvements to context based self-supervised learning’ (2018, edn.), pp. 9339-9348
81 Noroozi, M., and Favaro, P.: ‘Unsupervised learning of visual representations by solving jigsaw puzzles’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised learning of visual representations by solving jigsaw puzzles’ (Springer, 2016, edn.), pp. 69-84
82 Noroozi, M., Vinjimoor, A., Favaro, P., and Pirsiavash, H.: ‘Boosting self-supervised learning via knowledge transfer’, in Editor (Ed.)^(Eds.): ‘Book Boosting self-supervised learning via knowledge transfer’ (2018, edn.), pp. 9359-9367
83 Ren, Z., and Lee, Y.J.: ‘Cross-domain self-supervised multi-task feature learning using synthetic imagery’, in Editor (Ed.)^(Eds.): ‘Book Cross-domain self-supervised multi-task feature learning using synthetic imagery’ (2018, edn.), pp. 762-771
84 Asano, Y., Patrick, M., Rupprecht, C., and Vedaldi, A.: ‘Labelling unlabelled videos from scratch with multi-modal self-supervision’, Advances in Neural Information Processing Systems, 2020, 33, pp. 4660-4671
85 Caron, M., Bojanowski, P., Joulin, A., and Douze, M.: ‘Deep clustering for unsupervised learning of visual features’, in Editor (Ed.)^(Eds.): ‘Book Deep clustering for unsupervised learning of visual features’ (2018, edn.), pp. 132-149
86 Yan, X., Misra, I., Gupta, A., Ghadiyaram, D., and Mahajan, D.: ‘Clusterfit: Improving generalization of visual representations’, in Editor (Ed.)^(Eds.): ‘Book Clusterfit: Improving generalization of visual representations’ (2020, edn.), pp. 6509-6518
87 Bojanowski, P., and Joulin, A.: ‘Unsupervised learning by predicting noise’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised learning by predicting noise’ (PMLR, 2017, edn.), pp. 517-526
88 Jenni, S., and Favaro, P.: ‘Self-supervised feature learning by learning to spot artifacts’, in Editor (Ed.)^(Eds.): ‘Book Self-supervised feature learning by learning to spot artifacts’ (2018, edn.), pp. 2733-2742
89 Donahue, J., Krähenbühl, P., and Darrell, T.: ‘Adversarial feature learning’, arXiv preprint arXiv:1605.09782, 2016
90 Donahue, J., and Simonyan, K.: ‘Large scale adversarial representation learning’, Advances in neural information processing systems, 2019, 32
91 Mahendran, A., Thewlis, J., and Vedaldi, A.: ‘Cross pixel optical-flow similarity for self-supervised learning’, in Editor (Ed.)^(Eds.): ‘Book Cross pixel optical-flow similarity for self-supervised learning’ (Springer, 2018, edn.), pp. 99-116
92 Zhan, X., Pan, X., Liu, Z., Lin, D., and Loy, C.C.: ‘Self-supervised learning via conditional motion propagation’, in Editor (Ed.)^(Eds.): ‘Book Self-supervised learning via conditional motion propagation’ (2019, edn.), pp. 1881-1889
93 Noroozi, M., Pirsiavash, H., and Favaro, P.: ‘Representation learning by learning to count’, in Editor (Ed.)^(Eds.): ‘Book Representation learning by learning to count’ (2017, edn.), pp. 5898-5906
94 Gidaris, S., Singh, P., and Komodakis, N.: ‘Unsupervised representation learning by predicting image rotations’, arXiv preprint arXiv:1803.07728, 2018
95 Zhang, L., Qi, G.-J., Wang, L., and Luo, J.: ‘Aet vs. aed: Unsupervised representation learning by auto-encoding transformations rather than data’, in Editor (Ed.)^(Eds.): ‘Book Aet vs. aed: Unsupervised representation learning by auto-encoding transformations rather than data’ (2019, edn.), pp. 2547-2555
96 Chaitanya, K., Erdil, E., Karani, N., and Konukoglu, E.: ‘Contrastive learning of global and local features for medical image segmentation with limited annotations’, Advances in Neural Information Processing Systems, 2020, 33, pp. 12546-12558
97 Hadsell, R., Chopra, S., and LeCun, Y.: ‘Dimensionality reduction by learning an invariant mapping’, in Editor (Ed.)^(Eds.): ‘Book Dimensionality reduction by learning an invariant mapping’ (IEEE, 2006, edn.), pp. 1735-1742
98 Li, J., Zhou, P., Xiong, C., and Hoi, S.C.: ‘Prototypical contrastive learning of unsupervised representations’, arXiv preprint arXiv:2005.04966, 2020
99 Tian, Y., Krishnan, D., and Isola, P.: ‘Contrastive multiview coding’, in Editor (Ed.)^(Eds.): ‘Book Contrastive multiview coding’ (Springer, 2020, edn.), pp. 776-794
100 Wu, Z., Xiong, Y., Yu, S.X., and Lin, D.: ‘Unsupervised feature learning via non-parametric instance discrimination’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised feature learning via non-parametric instance discrimination’ (2018, edn.), pp. 3733-3742
101 Ye, M., Zhang, X., Yuen, P.C., and Chang, S.-F.: ‘Unsupervised embedding learning via invariant and spreading instance feature’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised embedding learning via invariant and spreading instance feature’ (2019, edn.), pp. 6210-6219
102 Zhan, X., Liu, Z., Luo, P., Tang, X., and Loy, C.: ‘Mix-and-match tuning for self-supervised semantic segmentation’, in Editor (Ed.)^(Eds.): ‘Book Mix-and-match tuning for self-supervised semantic segmentation’ (2018, edn.), pp.
103 Oord, A.v.d., Li, Y., and Vinyals, O.: ‘Representation learning with contrastive predictive coding’, arXiv preprint arXiv:1807.03748, 2018
104 Chen, X., Fan, H., Girshick, R., and He, K.: ‘Improved baselines with momentum contrastive learning’, arXiv preprint arXiv:2003.04297, 2020
105 Henaff, O.: ‘Data-efficient image recognition with contrastive predictive coding’, in Editor (Ed.)^(Eds.): ‘Book Data-efficient image recognition with contrastive predictive coding’ (PMLR, 2020, edn.), pp. 4182-4192
106 Zhuang, C., Zhai, A.L., and Yamins, D.: ‘Local aggregation for unsupervised learning of visual embeddings’, in Editor (Ed.)^(Eds.): ‘Book Local aggregation for unsupervised learning of visual embeddings’ (2019, edn.), pp. 6002-6012
107 Cao, Y., Xie, Z., Liu, B., Lin, Y., Zhang, Z., and Hu, H.: ‘Parametric instance classification for unsupervised visual feature learning’, Advances in neural information processing systems, 2020, 33, pp. 15614-15624
108 Ioffe, S., and Szegedy, C.: ‘Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift’, ArXiv, 2015, abs/1502.03167
109 Nair, V., and Hinton, G.E.: ‘Rectified Linear Units Improve Restricted Boltzmann Machines’, in Editor (Ed.)^(Eds.): ‘Book Rectified Linear Units Improve Restricted Boltzmann Machines’ (2010, edn.), pp.
110 Nguyen, D.T., Dax, M., Mummadi, C.K., Ngo, T.-P.-N., Nguyen, T.H.P., Lou, Z., and Brox, T.: ‘DeepUSPS: Deep Robust Unsupervised Saliency Prediction With Self-Supervision’, in Editor (Ed.)^(Eds.): ‘Book DeepUSPS: Deep Robust Unsupervised Saliency Prediction With Self-Supervision’ (2019, edn.), pp.
111 Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A.: ‘Going deeper with convolutions’, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1-9
112 Zhang, S., Liew, J.H., Wei, Y., Wei, S., and Zhao, Y.: ‘Interactive Object Segmentation With Inside-Outside Guidance’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 12231-12241
113 You, Y., Gitman, I., and Ginsburg, B.: ‘Scaling SGD Batch Size to 32K for ImageNet Training’, ArXiv, 2017, abs/1708.03888
114 Loshchilov, I., and Hutter, F.: ‘SGDR: Stochastic Gradient Descent with Warm Restarts’, arXiv: Learning, 2017
115 Goyal, P., Doll·r, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K.: ‘Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour’, ArXiv, 2017, abs/1706.02677
116 Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., and Zisserman, A.: ‘The Pascal Visual Object Classes (VOC) Challenge’, International Journal of Computer Vision, 2009, 88, pp. 303-338
117 Ren, S., He, K., Girshick, R.B., and Sun, J.: ‘Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39, pp. 1137-1149
118 Lin, T.-Y., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Doll·r, P., and Zitnick, C.L.: ‘Microsoft COCO: Common Objects in Context’, in Editor (Ed.)^(Eds.): ‘Book Microsoft COCO: Common Objects in Context’ (2014, edn.), pp.
119 He, K., Gkioxari, G., Doll·r, P., and Girshick, R.B.: ‘Mask R-CNN’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42, pp. 386-397
120 Bossard, L., Guillaumin, M., and Gool, L.V.: ‘Food-101 - Mining Discriminative Components with Random Forests’, in Editor (Ed.)^(Eds.): ‘Book Food-101 - Mining Discriminative Components with Random Forests’ (2014, edn.), pp.
121 Krizhevsky, A.: ‘Learning Multiple Layers of Features from Tiny Images’, in Editor (Ed.)^(Eds.): ‘Book Learning Multiple Layers of Features from Tiny Images’ (2009, edn.), pp.
122 Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., and Torralba, A.: ‘SUN database: Large-scale scene recognition from abbey to zoo’, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 3485-3492
123 Krause, J., Stark, M., Deng, J., and Fei-Fei, L.: ‘3D Object Representations for Fine-Grained Categorization’, 2013 IEEE International Conference on Computer Vision Workshops, 2013, pp. 554-561
124 Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., and Vedaldi, A.: ‘Describing Textures in the Wild’, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3606-3613
125 Shu, Y., Kou, Z., Cao, Z., Wang, J., and Long, M.: ‘Zoo-Tuning: Adaptive Transfer from a Zoo of Models’, ArXiv, 2021, abs/2106.15434
126 Yang, Q., Zhang, Y., Dai, W., and Pan, S.J.: ‘Transfer learning’ (Cambridge University Press, 2020. 2020)
127 You, K., Kou, Z., Long, M., and Wang, J.: ‘Co-Tuning for Transfer Learning’, in Editor (Ed.)^(Eds.): ‘Book Co-Tuning for Transfer Learning’ (2020, edn.), pp.
128 Misra, I., Shrivastava, A., Gupta, A., and Hebert, M.: ‘Cross-Stitch Networks for Multi-task Learning’, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3994-4003
129 Li, X., Xiong, H., Xu, C., and Dou, D.: ‘SMILE: Self-Distilled MIxup for Efficient Transfer LEarning’, ArXiv, 2021, abs/2103.13941
130 Tishby, N., and Zaslavsky, N.: ‘Deep learning and the information bottleneck principle’, 2015 IEEE Information Theory Workshop (ITW), 2015, pp. 1-5
131 Shwartz-Ziv, R., and Tishby, N.: ‘Opening the Black Box of Deep Neural Networks via Information’, ArXiv, 2017, abs/1703.00810
132 Amjad, R.A., and Geiger, B.C.: ‘Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42, pp. 2225-2239
133 Chen, T., Kornblith, S., Norouzi, M., and Hinton, G.E.: ‘A Simple Framework for Contrastive Learning of Visual Representations’, ArXiv, 2020, abs/2002.05709
134 Misra, I., and Maaten, L.v.d.: ‘Self-Supervised Learning of Pretext-Invariant Representations’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 6706-6716
135 Ermolov, A., Siarohin, A., Sangineto, E., and Sebe, N.: ‘Whitening for Self-Supervised Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Whitening for Self-Supervised Representation Learning’ (2021, edn.), pp.
136 Caron, M., Touvron, H., Misra, I., J'egou, H.e., Mairal, J., Bojanowski, P., and Joulin, A.: ‘Emerging Properties in Self-Supervised Vision Transformers’, ArXiv, 2021, abs/2104.14294
137 Hayhoe, M.M., and Ballard, D.H.: ‘Eye movements in natural behavior’, Trends in Cognitive Sciences, 2005, 9, pp. 188-194
138 BorjiAli, SihiteDicky, N., and IttiLaurent: ‘Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling’, IEEE Transactions on Image Processing, 2013
139 Benois-Pineau, J., and Callet, P.L.: ‘Visual Content Indexing and Retrieval with Psycho-Visual Models’, in Editor (Ed.)^(Eds.): ‘Book Visual Content Indexing and Retrieval with Psycho-Visual Models’ (2017, edn.), pp.
140 Awh, E., Armstrong, K.M., and Moore, T.: ‘Visual and oculomotor selection: links, causes and implications for spatial attention’, Trends in Cognitive Sciences, 2006, 10, pp. 124-130
141 Tian, Y., Chen, X., and Ganguli, S.: ‘Understanding self-supervised Learning Dynamics without Contrastive Pairs’, ArXiv, 2021, abs/2102.06810
142 Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A.: ‘Extracting and composing robust features with denoising autoencoders’, in Editor (Ed.)^(Eds.): ‘Book Extracting and composing robust features with denoising autoencoders’ (2008, edn.), pp.
143 Bojanowski, P., and Joulin, A.: ‘Unsupervised Learning by Predicting Noise’, ArXiv, 2017, abs/1704.05310
144 Noroozi, M., and Favaro, P.: ‘Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles’ (2016, edn.), pp.
145 Zhang, R., Isola, P., and Efros, A.A.: ‘Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction’, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 645-654
146 Mundhenk, T.N., Ho, D., and Chen, B.Y.: ‘Improvements to Context Based Self-Supervised Learning’, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 9339-9348
147 Donahue, J., and Simonyan, K.: ‘Large Scale Adversarial Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Large Scale Adversarial Representation Learning’ (2019, edn.), pp.
148 Bansal, V., Buckchash, H., and Raman, B.: ‘Discriminative Auto-Encoding for Classification and Representation Learning Problems’, IEEE Signal Processing Letters, 2021, 28, pp. 987-991
149 Chen, T., Kornblith, S., Swersky, K., Norouzi, M., and Hinton, G.E.: ‘Big Self-Supervised Models are Strong Semi-Supervised Learners’, ArXiv, 2020, abs/2006.10029
150 Jaiswal, A., Babu, A.R., Zadeh, M.Z., Banerjee, D., and Makedon, F.: ‘A Survey on Contrastive Self-supervised Learning’, ArXiv, 2020, abs/2011.00362
151 He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R.B.: ‘Momentum Contrast for Unsupervised Visual Representation Learning’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9726-9735
152 Zhang, X., and Maire, M.: ‘Self-Supervised Visual Representation Learning from Hierarchical Grouping’, ArXiv, 2020, abs/2012.03044
153 Jiang, H., Yuan, Z., Cheng, M.-M., Gong, Y., Zheng, N., and Wang, J.: ‘Salient Object Detection: A Discriminative Regional Feature Integration Approach’, International Journal of Computer Vision, 2013, 123, pp. 251-268
154 Kolesnikov, A., Zhai, X., and Beyer, L.: ‘Revisiting Self-Supervised Visual Representation Learning’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1920-1929
155 Ye, M., Zhang, X., Yuen, P., and Chang, S.-F.: ‘Unsupervised Embedding Learning via Invariant and Spreading Instance Feature’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6203-6212
156 Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Trischler, A., and Bengio, Y.: ‘Learning deep representations by mutual information estimation and maximization’, ArXiv, 2019, abs/1808.06670
157 Kornblith, S., Shlens, J., and Le, Q.V.: ‘Do Better ImageNet Models Transfer Better?’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2656-2666
158 Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Doersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, R., and Valko, M.: ‘Bootstrap your own latent a new approach to self-supervised learning’. Proc. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada2020 pp. Pages
159 Chen, X., and He, K.: ‘Exploring Simple Siamese Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Exploring Simple Siamese Representation Learning’ (2021, edn.), pp.
160 Xie, Z., Lin, Y., Zhang, Z., Cao, Y., Lin, S., and Hu, H.: ‘Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning’, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 16679-16688
161 Chen, X., Fan, H., Girshick, R.B., and He, K.: ‘Improved Baselines with Momentum Contrastive Learning’, ArXiv, 2020, abs/2003.04297
162 HÈnaff, O.J., Srinivas, A., Fauw, J.D., Razavi, A., Doersch, C., Eslami, S.M.A., and Oord, A.r.v.d.: ‘Data-Efficient Image Recognition with Contrastive Predictive Coding’, ArXiv, 2020, abs/1905.09272
163 Borji, A., Cheng, M.-M., Jiang, H., and Li, J.: ‘Salient Object Detection: A Benchmark’, IEEE Transactions on Image Processing, 2015, 24, pp. 5706-5722
164 Wang, W., Lai, Q., Fu, H., Shen, J., and Ling, H.: ‘Salient Object Detection in the Deep Learning Era: An In-Depth Survey’, IEEE transactions on pattern analysis and machine intelligence, 2021, PP
165 Zou, W., and Komodakis, N.: ‘HARF: Hierarchy-Associated Rich Features for Salient Object Detection’, 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 406-414
166 Zhang, J., Zhang, T., Dai, Y., Harandi, M., and Hartley, R.I.: ‘Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective’, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 9029-9038
167 Van Gansbeke, W., Vandenhende, S., Georgoulis, S., and Gool, L.V.: ‘Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals’, ArXiv, 2021, abs/2102.06191
168 Chen, T., Kornblith, S., Norouzi, M., and Hinton, G.: ‘A Simple Framework for Contrastive Learning of Visual Representations’. Proc. Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research2020 pp. Pages
169 Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A.: ‘Unsupervised learning of visual features by contrasting cluster assignments’. Proc. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada2020 pp. Pages
170 Zhao, Z., Zhang, Z., Chen, T., Singh, S., and Zhang, H.: ‘Image Augmentations for GAN Training’, ArXiv, 2020, abs/2006.02595
171 Howard, A.G.: ‘Some Improvements on Deep Convolutional Neural Network Based Image Classification’, CoRR, 2014, abs/1312.5402
172 Cubuk, E.D., Zoph, B., ManÈ, D., Vasudevan, V., and Le, Q.V.: ‘AutoAugment: Learning Augmentation Strategies From Data’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 113-123
173 Cubuk, E.D., Zoph, B., Shlens, J., and Le, Q.V.: ‘Randaugment: Practical automated data augmentation with a reduced search space’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 3008-3017
174 Lim, S., Kim, I., Kim, T., Kim, C., and Kim, S.: ‘Fast AutoAugment’, in Editor (Ed.)^(Eds.): ‘Book Fast AutoAugment’ (2019, edn.), pp.
175 Caron, M., Bojanowski, P., Joulin, A., and Douze, M.: ‘Deep Clustering for Unsupervised Learning of Visual Features’, in Editor (Ed.)^(Eds.): ‘Book Deep Clustering for Unsupervised Learning of Visual Features’ (2018, edn.), pp.
176 Richemond, P.H., Grill, J.-B., Altché, F., Tallec, C., Strub, F., Brock, A., Smith, S., De, S., Pascanu, R., and Piot, B.: ‘BYOL works even without batch statistics’, arXiv preprint arXiv:2010.10241, 2020
177 Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., and Hu, H.: ‘SimMIM: a Simple Framework for Masked Image Modeling’, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 9643-9653
178 Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A.L., and Kong, T.: ‘iBOT: Image BERT Pre-Training with Online Tokenizer’, ArXiv, 2021, abs/2111.07832
179 Oquab, M., Darcet, T.e., Moutakanni, T., Vo, H.Q., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.-Y., Li, S.-W., Misra, I., Rabbat, M.G., Sharma, V., Synnaeve, G., Xu, H., Jégou, H., Mairal, J., Labatut, P., Joulin, A., and Bojanowski, P.: ‘DINOv2: Learning Robust Visual Features without Supervision’, ArXiv, 2023, abs/2304.07193
180 Tran, V.-N., Huang, C.-E., Liu, S., Yang, K.-L., Ko, T., and Li, Y.-h.: ‘Multi-Augmentation for Efficient Self-Supervised Visual Representation Learning’, 2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2022, pp. 1-4
181 Krizhevsky, A., Sutskever, I., and Hinton, G.E.: ‘ImageNet classification with deep convolutional neural networks’, Communications of the ACM, 2012, 60, pp. 84 - 90
182 Touvron, H., Vedaldi, A., Douze, M., and Jégou, H.: ‘Fixing the train-test resolution discrepancy’, Advances in neural information processing systems, 2019, 32
183 Jones, D.R.: ‘A Taxonomy of Global Optimization Methods Based on Response Surfaces’, Journal of Global Optimization, 2001, 21, pp. 345-383
184 Reed, C., Metzger, S., Srinivas, A., Darrell, T., and Keutzer, K.: ‘SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning’, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 2673-2682
185 Radosavovic, I., Kosaraju, R.P., Girshick, R.B., He, K., and Dollár, P.: ‘Designing Network Design Spaces’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10425-10433
186 Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N.: ‘An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale’, ArXiv, 2021, abs/2010.11929
187 Salimans, T., and Kingma, D.P.: ‘Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks’, in Editor (Ed.)^(Eds.): ‘Book Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks’ (2016, edn.), pp.
188 Loshchilov, I., and Hutter, F.: ‘Fixing Weight Decay Regularization in Adam’, ArXiv, 2017, abs/1711.05101
189 Chen, X., Xie, S., and He, K.: ‘An Empirical Study of Training Self-Supervised Vision Transformers’, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 9620-9629
190 Lin, T.-Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., and Belongie, S.J.: ‘Feature Pyramid Networks for Object Detection’, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936-944
191 \url{https://github.com/facebookresearch/detectron2, accessed 2023/11/24 2023
192 Lin, T.-Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., and Belongie, S.J.: ‘Feature Pyramid Networks for Object Detection’, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 936-944
193 \url{https://github.com/facebookresearch/detectron, accessed 2023/11/25 2023
194 Li, Y., Mao, H., Girshick, R.B., and He, K.: ‘Exploring Plain Vision Transformer Backbones for Object Detection’, ArXiv, 2022, abs/2203.16527
195 Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., and Gool, L.V.: ‘The 2017 DAVIS Challenge on Video Object Segmentation’, ArXiv, 2017, abs/1704.00675
196 Jabri, A., Owens, A., and Efros, A.A.: ‘Space-Time Correspondence as a Contrastive Random Walk’, ArXiv, 2020, abs/2006.14613
197 Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., and Batra, D.: ‘Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization’, International Journal of Computer Vision, 2017, 128, pp. 336-359

簡易檢索 / 詳目顯示

相關論文