跳到主要內容

簡易檢索 / 詳目顯示

研究生: 陳祈銘
CHEN,CHI-MING
論文名稱: COVID-19的DNA病毒序列在潛空間下趨勢擬合和生成突變新DNA病毒序列
Generating DNA Sequences of COVID-19 and Trending Fitting in Latent Space
指導教授: 洪盟凱
周世偉
口試委員:
學位類別: 碩士
Master
系所名稱: 理學院 - 數學系
Department of Mathematics
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 55
中文關鍵詞: 新冠病毒變分自編碼器高斯過程迴歸DNA 病毒序列突變
外文關鍵詞: COVID-19, Variational AutoEncoder, Gaussian Process Regression, DNA virus sequence mutation
相關次數: 點閱:20下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 從2019年12月起新冠肺炎爆發,防疫措施愈來愈重要,隨著新型冠狀病毒的高傳播力和突變力,新的病毒株陸續出現,而這將導致疫苗防護力下降,甚至可能被突破,本研究將重點放在預測病毒株下一階段突變預測,把收集到的資料依照Nextstrain clade分類,將2021年1月以前非delta病毒出現前的DNA病毒序列提取出來,透過Variational AutoEncoder將DNA病毒序列提取其生物資訊後,使用高斯過程進行潛空間下比較不同kernel進行突變趨勢擬合,並生成之後1到6個月後的新DNA病毒序列,最後成功生成出新的病毒序列,並且突變出Dalta病毒株,說明DNA序列在Variational AutoEncoder上,成功提取其生物資訊,並能依時間演化建模,而這有助於提前開發新疫苗和預測新症狀等。


    Since the outbreak of new coronary pneumonia in December 2019, epidemic prevention measures have become more and more important. With the high transmissibility and mutation power of the new coronavirus, new virus strains have emerged one after another, which will lead to a decline in vaccine protection and may even be broken through. This study will focus on predicting the next-stage mutation prediction of virus strains, and classify the collected data according to the Nextstrain clade. Using the Gaussian process to compare different kernels in the latent space to fit the mutation trend, and generate a new DNA virus sequence 1 to 6 months later, and finally successfully generate a new virus sequence, and mutate the Dalta virus strain. It shows that the biological information of the DNA sequence can be successfully extracted on the Variational AutoEncoder, and can be modeled according to the evolution of time, which is helpful for the development of new vaccines and the prediction of new symptoms in advance.

    摘要 iii Abstract v 誌謝 vii 目錄 ix 一、 緒論 1 二、 DNA 序列資料來源 5 2.1 資料下載 5 2.2 資料描述 6 三、 資料前處理 (多序列比對) 9 3.1 序列間編輯距離 (Levenshtein distance) 9 3.2 Clustal Omega 中使用的比對方式 10 3.2.1 得分函數 11 3.2.2 空位罰分 11 3.2.3 聯合空位罰分 11 四、 模型 13 4.1 Variational AutoEncoder(VAE) 13 4.1.1 簡述 13 4.1.2 數學角度 14 4.1.3 模型建構 16 4.2 高斯過程 (Gaussian process) 18 4.2.1 kernel method 18 4.2.2 權重角度 19 4.2.3 函數角度 20 4.2.4 常見kernel 22 五、 研究結果 25 5.1 DNA 序列對齊差異 25 5.2 Variational AutoEncoder 訓練 26 5.3 高斯過程 kernel 比較結果 27 5.3.1 L 的選取和 kernel 間的比較 27 六、 總結 33 參考文獻 35

    [1] L. van Dorp, M. Acman, D. Richard, et al., “Emergence of genomic diversity
    and recurrent mutations in sars-cov-2,” Infection, Genetics and Evolution, vol. 83,
    p. 104 351, 2020.
    [2] S. Duffy, L. A. Shackelton, and E. C. Holmes, “Rates of evolutionary change
    in viruses: Patterns and determinants,” Nature Reviews Genetics, vol. 9, no. 4,
    pp. 267–276, 2008.
    [3] J. Shaman and M. Galanti, “Will sars-cov-2 become endemic?” Science, vol. 370,
    no. 6516, pp. 527–529, 2020.
    [4] A. Kumar, “Model evolution in sars-cov-2 spike protein sequences using a generative
    neural network,” bioRxiv, 2022.
    [5] Y. Li, C. Huang, L. Ding, Z. Li, Y. Pan, and X. Gao, “Deep learning in bioinformatics: Introduction, application, and perspective in the big data era,” Methods,
    vol. 166, pp. 4–21, 2019.
    [6] S. Min, B. Lee, and S. Yoon, “Deep learning in bioinformatics,” Briefings in bioinformatics, vol. 18, no. 5, pp. 851–869, 2017.
    [7] R. F. Mansour, J. Escorcia-Gutierrez, M. Gamarra, D. Gupta, O. Castillo, and
    S. Kumar, “Unsupervised deep learning based variational autoencoder model for
    covid-19 diagnosis and classification,” Pattern Recognition Letters, vol. 151, pp. 267–
    274, 2021.
    [8] S. Sinai, E. Kelsic, G. M. Church, and M. A. Nowak, “Variational auto-encoding
    of protein sequences,” arXiv preprint arXiv:1712.03346, 2017.
    [9] C. Doersch, “Tutorial on variational autoencoders,” arXiv preprint arXiv:1606.05908,
    2016.
    [10] S. N. Dean and S. A. Walper, “Variational autoencoder for generation of antimicrobial peptides,” ACS omega, vol. 5, no. 33, pp. 20 746–20 754, 2020.
    [11] R. R. Eguchi, C. A. Choe, and P.-S. Huang, “Ig-vae: Generative modeling of protein
    structure by direct 3d coordinate generation,” PLoS computational biology, vol. 18,
    no. 6, e1010271, 2022.
    [12] K. Gregor, I. Danihelka, A. Graves, D. Rezende, and D. Wierstra, “Draw: A recurrent neural network for image generation,” in International conference on machine
    learning, PMLR, 2015, pp. 1462–1471.
    [13] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint
    arXiv:1312.6114, 2013.
    [14] T. Salimans, D. Kingma, and M. Welling, “Markov chain monte carlo and variational inference: Bridging the gap,” in International conference on machine learning, PMLR, 2015, pp. 1218–1226.
    [15] J. Walker, C. Doersch, A. Gupta, and M. Hebert, “An uncertain future: Forecasting
    from static images using variational autoencoders,” in European Conference on
    Computer Vision, Springer, 2016, pp. 835–851.
    [16] G. P. Way and C. S. Greene, “Extracting a biologically relevant latent space from
    cancer transcriptomes with variational autoencoders,” in PACIFIC SYMPOSIUM
    ON BIOCOMPUTING 2018: Proceedings of the Pacific Symposium, World Scientific, 2018, pp. 80–91.
    [17] X. Ding, Z. Zou, and C. L. Brooks III, “Deciphering protein evolution and fitness
    landscapes with latent space models,” Nature communications, vol. 10, no. 1, pp. 1–
    13, 2019.
    [18] A. Hawkins-Hooker, F. Depardieu, S. Baur, G. Couairon, A. Chen, and D. Bikard,
    “Generating functional protein variants with variational autoencoders,” PLoS computational biology, vol. 17, no. 2, e1008736, 2021.
    [19] E. Schulz, M. Speekenbrink, and A. Krause, “A tutorial on gaussian process regression: Modelling, exploring, and exploiting functions,” Journal of Mathematical
    Psychology, vol. 85, pp. 1–16, 2018.
    [20] M. Seeger, “Gaussian processes for machine learning,” International journal of neural systems, vol. 14, no. 02, pp. 69–106, 2004.
    [21] C. K. Williams and C. E. Rasmussen, Gaussian processes for machine learning.
    MIT press Cambridge, MA, 2006.
    [22] L. Cheng, S. Ramchandran, T. Vatanen, et al., “An additive gaussian process regression model for interpretable non-parametric analysis of longitudinal data,” Nature
    communications, vol. 10, no. 1, pp. 1–11, 2019.
    [23] S. Roberts, M. Osborne, M. Ebden, S. Reece, N. Gibson, and S. Aigrain, “Gaussian processes for time-series modelling,” Philosophical Transactions of the Royal
    Society A: Mathematical, Physical and Engineering Sciences, vol. 371, no. 1984,
    p. 20 110 550, 2013.
    [24] P. A. Romero, A. Krause, and F. H. Arnold, “Navigating the protein fitness landscape with gaussian processes,” Proceedings of the National Academy of Sciences,
    vol. 110, no. 3, E193–E201, 2013.
    [25] S. King, X. E. Chen, S. W. Ng, et al., “Modeling the trajectory of sars-cov-2 spike
    protein evolution in continuous latent space using a neural network and gaussian
    process,” bioRxiv, 2021.
    [26] Á. O’Toole, E. Scher, A. Underwood, et al., “Assignment of epidemiological lineages
    in an emerging pandemic using the pangolin tool,” Virus evolution, vol. 7, no. 2,
    veab064, 2021.
    [27] J. Salvatier, T. V. Wiecki, and C. Fonnesbeck, “Probabilistic programming in
    python using pymc3,” PeerJ Computer Science, vol. 2, e55, 2016.
    [28] A. B. Abdessalem, N. Dervilis, D. J. Wagg, and K. Worden, “Automatic kernel
    selection for gaussian processes regression with approximate bayesian computation
    and sequential monte carlo,” Frontiers in Built Environment, vol. 3, p. 52, 2017.
    [29] K. P. Murphy, Probabilistic machine learning: an introduction. MIT press, 2022.

    QR CODE
    :::