跳到主要內容

簡易檢索 / 詳目顯示

研究生: 張覺意
Chueh-I Chang
論文名稱: 應用自然語言處理技術提供學生電子書閱讀理解能力之智慧化評量
指導教授: 楊鎮華
Steve Yang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 46
中文關鍵詞: 自然語言處理文檔摘要問題生成機器評分
外文關鍵詞: NLP, Document summarization, Question generation, Machine grading
相關次數: 點閱:13下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,教育各項資源逐漸數位化,數位教育平台也逐漸普及,學生的學習活動歷程也得以數位化,在傳統教學現場,教師要了解學生的閱讀理解能力,往往透過小考或一些課程互動,而在今日的數位平台中,如何量測學生的閱讀理解能力,是學習分析領域中一項重要的議題。
    隨著人工智慧快速發展,自然語言處理(NLP)領域在近年來得到顯著的突破,本文希望能利用當今最先進的自然語言處理技術,找出量測學生閱讀理解能力的最佳方法,此外,教師欲了解學生的閱讀理解能力,通常透過批改學生的小考來達成,然而教師在出題與批改上往往耗費大量的時間和人力成本,本文透過自然語言處理技術將這兩個步驟自動化,以利教師更快速地了解學生的閱讀理解能力。
    在本文中,我們透過比對學生於電子書中畫的重點和教師畫的重點的一致性來判斷學生的閱讀理解能力,比較TextRank, RAKE, BERT三種方法的代理度量(Proxy measure)效能,透過語言生成模型GPT-2產生小考問答題,透過語言代表模型BERT自動批改學生答案,最後根據批改結果自動給予學生建議並將結果反饋給教師,以完成高度自動化的閱讀理解能力智慧化評量。


    In recent years, various educational resources have been gradually digitized, e-learning platforms have gradually become popular, and students’ learning activities have been digitized. At traditional teaching sites, teachers need to understand students’ reading comprehension, often interacting through quizzes or in-class activities. In today's e-learning platforms, how to measure students' reading comprehension is an important topic in the field of learning analytics.
    With the rapid development of artificial intelligence, the field of natural language processing has made significant breakthroughs in recent years. This paper hopes to use state-of-the-art natural language processing technology to find the best way to measure students' reading comprehension. In addition, teachers want to know students' reading comprehension ability is usually achieved by marking students' quizzes. However, teachers often spend a lot of time and labor on setting and marking exam papers. This paper uses natural language processing technology to automate these two steps to help teachers understand students' reading comprehension more quickly.
    In this paper, we measure the reading comprehension of students by comparing the consistency of the markers drawn by students in e-books and the markers drawn by teachers, then we compared the proxy measure performance of the three methods of TextRank, RAKE, and BERT. In quiz generation phase, we use GPT-2, a state-of-the-art language generation model, to generate quizzes by parsing materials. In the grading phase, we use BERT, a pre-trained language understanding model, to grade students’ answers automatically, and give them guiding according to grading results to complete a highly automated reading comprehension measurement framework.

    目錄 摘要 i ABSTRACT ii 目錄 iii 圖目錄 iv 表目錄 iv 一、 緒論 1 1.1 了解學生閱讀理解能力的方式 1 1.2 自動化評量帶來的效益 1 1.3 自動化產生的考題與自動化批改 1 二、 文獻探討 2 2.1 文檔摘要的方法 2 2.2 問題生成的方法 2 2.3 自動評分的方法 3 三、 研究方法 4 3.1 參與者與資料集 4 3.2 BookRoll 5 3.3 課程活動 5 3.4 方法與模型 6 3.4.1 TextRank 6 3.4.2 RAKE 7 3.4.3 BERT 8 3.4.4 句法分析 10 3.4.5 GPT-2 11 3.5 系統流程 13 3.5.1 前處理 13 3.5.2 自動標記教材重點 13 3.5.3 從教材自動生成問題 14 3.5.4 簡答題自動評分 15 3.5.5 Memo quality與推薦機制 16 3.5.6 系統資料流 18 四、 結果及討論 19 4.1 評估標準 20 4.2 研究結果 22 4.2.1 為什麼我們要使用marker quality來量測學生的閱讀理解能力? 22 4.2.2 自動化評量的準確度為何? 23 4.3 討論 31 五、 結論與未來研究 33 六、 參考文獻 34 圖目錄 圖 1、本研究中,輔助課程教學使用的線上閱讀平台BookRoll 5 圖 2、Transformers中的Encoder與Decoder 9 圖 3、BERT的兩階段遷移式學習 10 圖 4、BERT的自注意力機制和GPT-2的遮罩式自注意力機制 12 圖 5、完整句子與不完整句子透過句法分析得到的語法樹 15 圖 6、week4_List 列表(1)教材中的第六頁及第九頁內容 18 圖 7、自動重點標記流程 18 圖 8、自動簡答題評分模型訓練流程 19 圖 9、完整系統之資料流 19 圖 10、TA-based measure和Machine-based measure之關係圖 26 圖 11、學生撰寫包含程式碼的答案 33 圖 12、學生撰寫包含虛擬程式碼的答案 33 圖 13、學生使用特殊排版撰寫的答案 33 表目錄 表 1、結構樹中常用代號 11 表 2、各頁內容與答案的餘弦距離 18 表 3、評分模型預測結果產生之四種情況 21 表 4、高分群與全班學生之Marker quality比較 23 表 5、三種方法產生的重點與助教重點的吻合程度 23 表 6、TA-based measure和Machine-based measure之相關係數與P-value 27 表 7、問題生成之統計結果 27 表 8、自動問題生成結果 30 表 9、自動簡答題評分之結果 30

    Robertson, S. (2004). Understanding inverse document frequency: on theoretical arguments for IDF. Journal of documentation.
    Mihalcea, R., & Tarau, P. (2004, July). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411).
    Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic keyword extraction from individual documents. Text mining: applications and theory, 1, 1-20.
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
    Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685.
    Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems (pp. 5754-5764).
    Liu, Y. (2019). Fine-tune BERT for extractive summarization. arXiv preprint arXiv:1903.10318.
    Miller, D. (2019). Leveraging BERT for extractive text summarization on lectures. arXiv preprint arXiv:1906.04165.
    Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.
    Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., & Zhou, M. (2017, November). Neural question generation from text: A preliminary study. In National CCF Conference on Natural Language Processing and Chinese Computing (pp. 662-671). Springer, Cham.
    Heilman, M. (2011). Automatic factual question generation from text. Language Technologies Institute School of Computer Science Carnegie Mellon University, 195.
    Le, N. T., Kojiri, T., & Pinkwart, N. (2014). Automatic question generation for educational applications–the state of art. In Advanced Computational Methods for Knowledge Engineering(pp. 325-338). Springer, Cham.
    Du, X., Shao, J., & Cardie, C. (2017). Learning to ask: Neural question generation for reading comprehension. arXiv preprint arXiv:1705.00106.
    Zhao, Y., Ni, X., Ding, Y., & Ke, Q. (2018). Paragraph-level neural question generation with maxout pointer and gated self-attention networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 3901-3910).
    Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104-3112).
    Kriangchaivech, K., & Wangperawong, A. (2019). Question Generation by Transformers. arXiv preprint arXiv:1909.05017.
    Kim, Y., Lee, H., Shin, J., & Jung, K. (2019, July). Improving neural question generation using answer separation. In Proceedings of the AAAI Conference on Artificial Intelligence(Vol. 33, pp. 6602-6609).
    Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.
    Chan, Y. H., & Fan, Y. C. (2019, November). A Recurrent BERT-based Model for Question Generation. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering (pp. 154-162).
    Klein, T., & Nabi, M. (2019). Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds. arXiv preprint arXiv:1911.02365.
    Krishna, K., & Iyyer, M. (2019). Generating Question-Answer Hierarchies. arXiv preprint arXiv:1906.02622.
    Ellis B. Page. 1967. Grading essays by computer: progress report. In Proceedings of the Invitational Conference on Testing Problems, pages 87–100.
    Landauer, T. K. & Dumais, S. T. (1997). A solution to Plato's problem: The Latent Semanctic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-140
    Foltz, P. W., Laham, D., & Landauer, T. K. (1999). The intelligent essay assessor: Applications to educational technology. Interactive Multimedia Electronic Journal of Computer-Enhanced Learning, 1(2), 939-944.
    Thomas K. Landauer, Darrell Laham, and Peter W. Foltz. (2003). Automated scoring and annotation of essays with the Intelligent Essay Assessor. In M.D. Shermis and J.C. Burstein, editors, Automated essay scoring: A cross-disciplinary perspective, pages 87–112.
    Zhang, L., Huang, Y., Yang, X., Yu, S., & Zhuang, F. (2019). An automatic short-answer grading model for semi-open-ended questions. Interactive Learning Environments, 1-14.
    Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
    Hasanah, U., Permanasari, A. E., Kusumawardani, S. S., & Pribadi, F. S. (2019). A scoring rubric for automatic short answer grading system. Telkomnika, 17(2), 763-770.
    Wang, Z., Lan, A. S., Waters, A. E., Grimaldi, P., & Baraniuk, R. G. A Meta-Learning Augmented Bidirectional Transformer Model for Automatic Short Answer Grading.
    Liu, T., Ding, W., Wang, Z., Tang, J., Huang, G. Y., & Liu, Z. (2019, June). Automatic Short Answer Grading via Multiway Attention Networks. In International Conference on Artificial Intelligence in Education (pp. 169-173). Springer, Cham.
    Sung, C., Dhamecha, T., Saha, S., Ma, T., Reddy, V., & Arora, R. (2019, November). Pre-Training BERT on Domain Resources for Short Answer Grading. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 6073-6077).
    Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311-318). Association for Computational Linguistics.
    Torrey, L., & Shavlik, J. (2010). Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques (pp. 242-264). IGI Global.

    QR CODE
    :::