跳到主要內容

簡易檢索 / 詳目顯示

研究生: 龔若齊
Jo-Chi Kung
論文名稱: CCG: 交通事故對話式筆錄蒐集代理人
CCG: A Conversational Agent for Traffic Accidents Information Collection Based on Large Language Models
指導教授: 張嘉惠
Chia-Hui Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 71
中文關鍵詞: 交通事故資訊蒐集大型語言模型對話代理人資訊擷取
外文關鍵詞: LLM, Agent, Traffic Accidents, Information Extraction
相關次數: 點閱:15下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 台灣的交通事故頻繁,現行人工筆錄流程繁瑣、耗時且易遺漏關鍵資訊,造成執法人員與當事人在人力與時間上的高成本。為改善此問題,本研究提出了一個基於大型語言模型的交通事故資訊蒐集代理人(Collision Care Guide, CCG),對話引導方式系統性蒐集交通事故相關資訊,並實現結構化事故紀錄與自然語言敘述的雙向準確轉換。

    CCG 系統由三個模組構成:問題生成模組動態依據缺失欄位與使用者回覆調整提問;資訊擷取模組將口語敘述擷取為JSON結構化事故儲存格式;事故重建模組則將完整結構化資料重新組織為可讀性高的事故敘述供使用者確認。

    為驗證系統可用性與穩健性,本研究建立三層次評估框架:AI 代理測試、真人使用者測試與事故重建評估。結果顯示,AI代理測試中系統於對話品質與資訊擷取各維度均獲得 4.0分 以上的評分(滿分5分),且其LLM自動評分與人工評分的分數分布呈現顯著正相關(Spearman r = 0.474, p < 0.001),證實評估的一致性。真人使用者測試中,資訊擷取效能達F1=0.909,與AI代理測試結果高度一致。事故重建評估顯示,重建敘述轉換在資訊完整性獲得4.7分以上(滿分5分),語義相似度也高達0.9,證實結構事故紀錄與自然事故敘述之資訊保真與語義一致性。

    此外,為降低成本並提升私有部署與資料隱私可行性,本研究進一步比較微調開源 Llama 模型與基線商業模型(GPT-4o-mini)之效能。測試集結果顯示,資訊擷取模組欄位完全準確率皆大於0.89,整體語義相似度約0.995;問題生成模組的平均語義相似度約0.85,聯合訓練的模型結果也在整合兩項任務保持高穩定的效能。LLM 自動評估亦顯示微調模型在對話品質與資訊擷取兩面向表現均 ≥4(滿分 5),驗證任務特化微調的有效性。

    綜上,CCG 達成準確並結構化的資訊擷取,並證實開源模型微調方案具備實務部署價值,可為交通事故處理、保險理賠與法律前置蒐證提供標準化支援。


    Frequent road traffic accidents in Taiwan impose substantial procedural and cognitive burdens on law enforcement, insurers, and involved parties. Conventional manually driven reporting workflows are time-consuming, error-prone, and susceptible to omission of salient facts, thereby delaying downstream responsibility assessment and claims processing. To address these limitations, we propose Collision Care Guide (CCG), a Large Language Model (LLM)-based conversational agent that systematizes the bi-directional transformation between unstructured natural-language accident narratives and a structured Traffic Accident Record Format (TARF) representation.

    CCG comprises three coordinated modules: (1) a Question Generation Module that adaptively formulates targeted inquiries based on missing fields and prior user responses; (2) an Information Extraction Module that converts colloquial, potentially partial or disfluent utterances into a structured JSON record; and (3) an Accident Reconstruction Module that regenerates a coherent, human-readable narrative from the completed structured record for verification and downstream use.

    We design a three-tier evaluation framework integrating large-scale AI agent simulation, human user dialogues, and reconstruction fidelity assessment. In AI agent experiments, CCG attains dialogue quality and information extraction scores ≥4.5/5 across fluency, relevance, and coherence dimensions; LLM-based automated scores exhibit significant positive correlation with human ratings (Spearman r = 0.474, p < 0.001). Human evaluation yields an information extraction F1 score of 0.909, closely matching AI agent performance (0.908), evidencing robustness across user types. Reconstruction achieves completeness scores ≥4.7/5 with semantic similarity of 0.90, confirming high-fidelity bidirectional conversion.

    To enhance deployability under cost and privacy constraints, we further fine-tune an open-source Llama model. Relative to the GPT‑4o‑mini baseline, the fine-tuned model achieves field-level exact accuracy >0.94 and overall JSON semantic similarity ≈0.99 in extraction, and a 0.85 average semantic similarity in question generation, while maintaining ≥4/5 LLM-based evaluation scores. Results collectively demonstrate CCG’s effectiveness, stability, and extensibility, offering a reusable methodological template for structured information collection in safety-critical legal-adjacent domains.

    目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 圖目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv 表目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 謝誌. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 一、緒論. . . . . . . . . . . . . . . . . . . . . . . . 1 二、相關研究. . . . . . . . . . . . . . . . . . . . . . 4 2-1交通事故AI分析與應用. . . . . . . . . . . . . 4 2-1-1事故風險與時空預測. . . . . . . . . . . . . . . 4 2-1-2多模態事故重建與責任判定. . . . . . . . . . . 5 2-1-3大規模文本處理與資料稀疏補強. . . . . . . . . 5 2-2法律自然語言處理與法律科技. . . . . . . . . . 5 2-2-1法律問答(LQA). . . . . . . . . . . . . . . . 5 2-2-2判決預測(LJP)與案例檢索(LCR). . . . . 6 2-2-3法律文本理解與基準. . . . . . . . . . . . . . . 6 2-2-4法律領域模型與文本轉換. . . . . . . . . . . . . 6 2-3對話系統LLM評估方法. . . . . . . . . . . . . 7 2-3-1傳統詞彙匹配. . . . . . . . . . . . . . . . . . . 7 2-3-2自監督、無監督與適應式方法. . . . . . . . . . 7 2-3-3多維度LLM自動評估. . . . . . . . . . . . . . 7 2-4總結. . . . . . . . . . . . . . . . . . . . . . . . 8 三、方法. . . . . . . . . . . . . . . . . . . . . . . . 9 3-1系統架構與資料格式. . . . . . . . . . . . . . . 9 3-2多輪互動流程概述. . . . . . . . . . . . . . . . 10 3-3 QuestionGenerationModule . . . . . . . . . . . 11 3-4 InformationExtractionModule . . . . . . . . . 12 3-5 AccidentReconstructionModule . . . . . . . . . 13 四、實驗. . . . . . . . . . . . . . . . . . . . . . . . 15 4-1 DialogwithAIAgentUsers . . . . . . . . . . . 15 4-1-1 DialogQualityEvaluationbyLLM . . . . . . . 16 4-1-2 InformationExtractionEvaluationbyLLM. . . 17 4-1-3 HumanCross-Validation . . . . . . . . . . . . . 18 4-2 DialogwithHumanUsers . . . . . . . . . . . . 20 4-2-1 DialogQualityEvaluationbyLLM . . . . . . . 20 4-2-2 ExtractionPerformance. . . . . . . . . . . . . . 21 4-2-3 EvaluationResults . . . . . . . . . . . . . . . . 22 4-3 AccidentReconstructionEvaluation . . . . . . . 23 4-3-1 InformationCompletenessEvaluation . . . . . . 23 4-3-2 NarrativeQualityEvaluation. . . . . . . . . . . 24 4-3-3 SemanticSimilarity . . . . . . . . . . . . . . . . 26 五、模型訓練. . . . . . . . . . . . . . . . . . . . . . 27 5-1訓練目標與動機. . . . . . . . . . . . . . . . . . 27 5-2訓練資料準備. . . . . . . . . . . . . . . . . . . 28 5-2-1資料格式設計. . . . . . . . . . . . . . . . . . . 28 5-2-2資料標籤分布與平衡策略. . . . . . . . . . . . . 28 5-2-3資料多樣性生成與品質控制. . . . . . . . . . . 29 5-2-4資料集統計. . . . . . . . . . . . . . . . . . . . 30 5-3模型選擇與訓練配置. . . . . . . . . . . . . . . 31 5-4模型效能評估結果. . . . . . . . . . . . . . . . 31 5-4-1測試集驗證. . . . . . . . . . . . . . . . . . . . 31 5-4-2對話品質評估結果. . . . . . . . . . . . . . . . 35 5-4-3資訊擷取評估結果. . . . . . . . . . . . . . . . 36 5-5結果分析與結論. . . . . . . . . . . . . . . . . . 37 六、限制. . . . . . . . . . . . . . . . . . . . . . . . 39 七、總結. . . . . . . . . . . . . . . . . . . . . . . . 41 參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 附件1 CCG系統完整提示詞. . . . . . . . . . . . . . . 49 附件1-1問題生成模組提示詞. . . . . . . . . . . . . . . 49 附件1-2資訊擷取模組提示詞. . . . . . . . . . . . . . . 49 附件1-3事故重建模組提示詞. . . . . . . . . . . . . . . 50 附件1-4參數變數對照表. . . . . . . . . . . . . . . . . . 51 附件2系統評估模組提示詞. . . . . . . . . . . . . . . 52 附件2-1資訊擷取品質評估模組. . . . . . . . . . . . . . 52 附件2-2對話品質評估模組. . . . . . . . . . . . . . . . 54 附件2-3模擬當事人模組提示詞. . . . . . . . . . . . . . 55 附件2-4評估模組參數說明. . . . . . . . . . . . . . . . 55 附件3聊天紀錄範例. . . . . . . . . . . . . . . . . . . 56

    [1] Zhengyang Zhou, Yang Wang, Xike Xie, Lianliang Chen, and
    Hengchang Liu. Riskoracle: A minute-level citywide traffic acci
    dent forecasting framework. Proceedings of the AAAI Conference
    on Artificial Intelligence, 34(01):1258–1265, Apr. 2020.
    [2] Tianqi Wang, Sukmin Kim, Wenxuan Ji, Enze Xie, Chongjian Ge,
    Junsong Chen, Zhenguo Li, and Ping Luo. Deepaccident: A motion
    and accident prediction benchmark for v2x autonomous driving,
    2023.
    [3] Kebin Wu, Wenbin Li, and Xiaofei Xiao. Accidentgpt: Large multi
    modal foundation model for traffic accident analysis, 2024.
    [4] Jianwu Fang, Lei lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu,
    Chen Lv, Jianru Xue, and Tat-Seng Chua. Abductive ego-view
    accident video understanding for safe driving perception, 2024.
    [5] Sungjae Lee and Yong-Gu Lee. Split liability assessment in car
    accident using 3d convolutional neural network. Journal of Com
    putational Design and Engineering, 10(4):1579–1601, 06 2023.
    [6] Eleonora Papadimitriou, Ashleigh Filtness, Athanasios Theofilatos,
    Apostolos Ziakopoulos, Claire Quigley, and George Yannis. Review
    and ranking of crash risk factors related to the road infrastructure.
    Accident Analysis & Prevention, 125:85–97, 2019.
    [7] Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion
    Androutsopoulos, Daniel Katz, and Nikolaos Aletras. LexGLUE: A
    benchmark dataset for legal language understanding in English. In
    Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, edi
    tors, Proceedings of the 60th Annual Meeting of the Association for
    Computational Linguistics (Volume 1: Long Papers), pages 4310
    4330, Dublin, Ireland, May 2022. Association for Computational
    Linguistics.
    [8] Chidiogo Nwabuike. Giving cars eyes: Using computer vision to pre
    vent road traffic accidents in nigeria. Proceedings of the AAAI Con
    ference on Artificial Intelligence, 39(28):29593–29595, Apr. 2025.
    [9] Artur Grigorev, Khaled Saleh, Yuming Ou, and Adriana-Simona
    Mihaita. Enhancing traffic incident management with large lan
    guage models: A hybrid machine learning approach for severity
    classification, 2024.
    [10] Noushin Behboudi, Sobhan Moosavi, and Rajiv Ramnath. Recent
    advances in traffic accident analysis and prediction: A comprehen
    sive review of machine learning techniques, 2024.
    [11] Jinqi Lai, Wensheng Gan, Jiayang Wu, Zhenlian Qi, and Philip S.
    Yu. Large language models in law: A survey. AI Open, 5:181–196,
    2024.
    [12] Farid Ariai and Gianluca Demartini. Natural language processing
    for the legal domain: A survey of tasks, datasets, models, and chal
    lenges, 2025.
    [13] Jorge Martinez-Gil. A survey on legal question–answering systems.
    Computer Science Review, 48:100552, 2023.
    [14] Arian Askari, Suzan Verberne, and Gabriella Pasi. Expert finding
    in legal community question answering, 2022.
    [15] Arian Askari, Zihui Yang, Zhaochun Ren, and Suzan Verberne. An
    swer retrieval in legal community question answering, 2024.
    [16] Marius Büttner and Ivan Habernal. Answering legal questions from
    laymen in German civil law system. In Yvette Graham and Matthew
    Purver, editors, Proceedings of the 18th Conference of the European
    Chapter of the Association for Computational Linguistics (Volume
    1: Long Papers), pages 2015–2027, St. Julian’s, Malta, March 2024.
    Association for Computational Linguistics.
    [17] Yi Feng, Chuanyi Li, and Vincent Ng. Legal judgment prediction via
    event extraction with constraints. In Smaranda Muresan, Preslav
    Nakov, and Aline Villavicencio, editors, Proceedings of the 60th
    Annual Meeting of the Association for Computational Linguistics
    (Volume 1: Long Papers), pages 648–664, Dublin, Ireland, May
    2022. Association for Computational Linguistics.
    [18] Yi Feng, Chuanyi Li, and Vincent Ng. Legal case retrieval: A sur
    vey of the state of the art. In Lun-Wei Ku, Andre Martins, and
    Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting
    of the Association for Computational Linguistics (Volume 1: Long
    Papers), pages 6472–6485, Bangkok, Thailand, August 2024. Asso
    ciation for Computational Linguistics.
    [19] Ilias Chalkidis and Dimitrios Kampas. Deep learning in law: early
    adaptation and legal word embeddings trained on large corpora.
    Artificial Intelligence and Law, 27(2):171–198, 2019.
    [20] Zhi Zhou, Jiang-Xin Shi, Peng-Xiao Song, Xiao-Wen Yang, Yi
    Xuan Jin, Lan-Zhe Guo, and Yu-Feng Li. Lawgpt: A chinese legal
    knowledge-enhanced large language model, 2024.
    [21] Ha-Thanh Nguyen. A brief report on lawgpt 1.0: A virtual legal
    assistant based on gpt-3, 2023.
    [22] Jiaxi Cui, Munan Ning, Zongjian Li, Bohua Chen, Yang Yan, Hao
    Li, Bin Ling, Yonghong Tian, and Li Yuan. Chatlaw: A multi
    agent collaborative legal assistant with knowledge graph enhanced
    mixture-of-experts large language model, 2024.
    [23] Eason Chen, Niall Roche, Yuen-Hsien Tseng, Walter Hernandez,
    Jiangbo Shangguan, and Alastair Moore. Conversion of legal agree
    ments into smart legal contracts using nlp. In Companion Proceed
    ings of the ACM Web Conference 2023, WWW ’23, page 1112
    1118. ACM, April 2023.
    [24] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu.
    Bleu: a method for automatic evaluation of machine translation.
    In Pierre Isabelle, Eugene Charniak, and Dekang Lin, editors, Pro
    ceedings of the 40th Annual Meeting of the Association for Com
    putational Linguistics, pages 311–318, Philadelphia, Pennsylvania,
    USA, July 2002. Association for Computational Linguistics.
    [25] Chin-Yew Lin and Eduard Hovy. Automatic evaluation of sum
    maries using n-gram co-occurrence statistics. In Proceedings of
    the 2003 Human Language Technology Conference of the North
    American Chapter of the Association for Computational Linguis
    tics, pages 150–157, 2003.
    [26] Longxuan Ma, Ziyu Zhuang, Weinan Zhang, Mingda Li, and Ting
    Liu. SelF-eval: Self-supervised fine-grained dialogue evaluation.
    In Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James
    Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi
    Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Pag
    gio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He,
    Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon
    Na, editors, Proceedings of the 29th International Conference on
    Computational Linguistics, pages 485–495, Gyeongju, Republic of
    Korea, October 2022. International Committee on Computational
    Linguistics.
    [27] Shikib Mehri and Maxine Eskenazi. USR: An unsupervised and ref
    erence free evaluation metric for dialog generation. In Dan Jurafsky,
    Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, Proceed
    ings of the 58th Annual Meeting of the Association for Computa
    tional Linguistics, pages 681–707, Online, July 2020. Association
    for Computational Linguistics.
    [28] Shikib Mehri and Maxine Eskenazi. Unsupervised evaluation of
    interactive dialog with DialoGPT. In Olivier Pietquin, Smaranda
    Muresan, Vivian Chen, Casey Kennington, David Vandyke, Nina
    Dethlefs, Koji Inoue, Erik Ekstedt, and Stefan Ultes, editors, Pro
    ceedings of the 21th Annual Meeting of the Special Interest Group
    on Discourse and Dialogue, pages 225–235, 1st virtual meeting, July
    2020. Association for Computational Linguistics.
    [29] John Mendonca, Alon Lavie, and Isabel Trancoso. QualityAdapt:
    an automatic dialogue quality estimation framework. In Oliver
    Lemon, Dilek Hakkani-Tur, Junyi Jessy Li, Arash Ashrafzadeh,
    Daniel Hernández Garcia, Malihe Alikhani, David Vandyke, and
    Ondřej Dušek, editors, Proceedings of the 23rd Annual Meeting of
    the Special Interest Group on Discourse and Dialogue, pages 83–90,
    Edinburgh, UK, September 2022. Association for Computational
    Linguistics.
    [30] ChaeHun Park, Minseok Choi, Dohyun Lee, and Jaegul Choo.
    Paireval: Open-domain dialogue evaluation with pairwise compari
    son, 2024.
    [31] Chen Zhang, Luis Fernando D’Haro, Qiquan Zhang, Thomas
    Friedrichs, and Haizhou Li. FineD-eval: Fine-grained automatic
    dialogue-level evaluation. In Yoav Goldberg, Zornitsa Kozareva,
    and Yue Zhang, editors, Proceedings of the 2022 Conference on Em
    pirical Methods in Natural Language Processing, pages 3336–3355,
    Abu Dhabi, United Arab Emirates, December 2022. Association for
    Computational Linguistics.
    [32] Chen Zhang, Luis Fernando D’Haro, Chengguang Tang, Ke Shi,
    Guohua Tang, and Haizhou Li. xdial-eval: A multilingual open
    domain dialogue evaluation benchmark, 2023.
    [33] Cheng-Han Chiang and Hung yi Lee. Can large language models
    be an alternative to human evaluations?, 2023.
    [34] Yen-Ting Lin and Yun-Nung Chen. Llm-eval: Unified multi
    dimensional automatic evaluation for open-domain conversations
    with large language models, 2023.
    [35] Jinlan Fu, See-Kiong Ng, Zhengbao Jiang, and Pengfei Liu.
    Gptscore: Evaluate as you desire, 2023.
    [36] Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu,
    and Chenguang Zhu. G-eval: Nlg evaluation using gpt-4 with better
    human alignment, 2023.
    [37] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet,
    Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman
    Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand
    Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and
    efficient foundation language models, 2023.
    [38] Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu,
    Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low
    rank adaptation of large language models, 2021.

    QR CODE
    :::