| 研究生: |
龔若齊 Jo-Chi Kung |
|---|---|
| 論文名稱: |
CCG: 交通事故對話式筆錄蒐集代理人 CCG: A Conversational Agent for Traffic Accidents Information Collection Based on Large Language Models |
| 指導教授: |
張嘉惠
Chia-Hui Chang |
| 口試委員: | |
| 學位類別: |
碩士 Master |
| 系所名稱: |
資訊電機學院 - 資訊工程學系 Department of Computer Science & Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 71 |
| 中文關鍵詞: | 交通事故資訊蒐集 、大型語言模型 、對話代理人 、資訊擷取 |
| 外文關鍵詞: | LLM, Agent, Traffic Accidents, Information Extraction |
| 相關次數: | 點閱:15 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
台灣的交通事故頻繁,現行人工筆錄流程繁瑣、耗時且易遺漏關鍵資訊,造成執法人員與當事人在人力與時間上的高成本。為改善此問題,本研究提出了一個基於大型語言模型的交通事故資訊蒐集代理人(Collision Care Guide, CCG),對話引導方式系統性蒐集交通事故相關資訊,並實現結構化事故紀錄與自然語言敘述的雙向準確轉換。
CCG 系統由三個模組構成:問題生成模組動態依據缺失欄位與使用者回覆調整提問;資訊擷取模組將口語敘述擷取為JSON結構化事故儲存格式;事故重建模組則將完整結構化資料重新組織為可讀性高的事故敘述供使用者確認。
為驗證系統可用性與穩健性,本研究建立三層次評估框架:AI 代理測試、真人使用者測試與事故重建評估。結果顯示,AI代理測試中系統於對話品質與資訊擷取各維度均獲得 4.0分 以上的評分(滿分5分),且其LLM自動評分與人工評分的分數分布呈現顯著正相關(Spearman r = 0.474, p < 0.001),證實評估的一致性。真人使用者測試中,資訊擷取效能達F1=0.909,與AI代理測試結果高度一致。事故重建評估顯示,重建敘述轉換在資訊完整性獲得4.7分以上(滿分5分),語義相似度也高達0.9,證實結構事故紀錄與自然事故敘述之資訊保真與語義一致性。
此外,為降低成本並提升私有部署與資料隱私可行性,本研究進一步比較微調開源 Llama 模型與基線商業模型(GPT-4o-mini)之效能。測試集結果顯示,資訊擷取模組欄位完全準確率皆大於0.89,整體語義相似度約0.995;問題生成模組的平均語義相似度約0.85,聯合訓練的模型結果也在整合兩項任務保持高穩定的效能。LLM 自動評估亦顯示微調模型在對話品質與資訊擷取兩面向表現均 ≥4(滿分 5),驗證任務特化微調的有效性。
綜上,CCG 達成準確並結構化的資訊擷取,並證實開源模型微調方案具備實務部署價值,可為交通事故處理、保險理賠與法律前置蒐證提供標準化支援。
Frequent road traffic accidents in Taiwan impose substantial procedural and cognitive burdens on law enforcement, insurers, and involved parties. Conventional manually driven reporting workflows are time-consuming, error-prone, and susceptible to omission of salient facts, thereby delaying downstream responsibility assessment and claims processing. To address these limitations, we propose Collision Care Guide (CCG), a Large Language Model (LLM)-based conversational agent that systematizes the bi-directional transformation between unstructured natural-language accident narratives and a structured Traffic Accident Record Format (TARF) representation.
CCG comprises three coordinated modules: (1) a Question Generation Module that adaptively formulates targeted inquiries based on missing fields and prior user responses; (2) an Information Extraction Module that converts colloquial, potentially partial or disfluent utterances into a structured JSON record; and (3) an Accident Reconstruction Module that regenerates a coherent, human-readable narrative from the completed structured record for verification and downstream use.
We design a three-tier evaluation framework integrating large-scale AI agent simulation, human user dialogues, and reconstruction fidelity assessment. In AI agent experiments, CCG attains dialogue quality and information extraction scores ≥4.5/5 across fluency, relevance, and coherence dimensions; LLM-based automated scores exhibit significant positive correlation with human ratings (Spearman r = 0.474, p < 0.001). Human evaluation yields an information extraction F1 score of 0.909, closely matching AI agent performance (0.908), evidencing robustness across user types. Reconstruction achieves completeness scores ≥4.7/5 with semantic similarity of 0.90, confirming high-fidelity bidirectional conversion.
To enhance deployability under cost and privacy constraints, we further fine-tune an open-source Llama model. Relative to the GPT‑4o‑mini baseline, the fine-tuned model achieves field-level exact accuracy >0.94 and overall JSON semantic similarity ≈0.99 in extraction, and a 0.85 average semantic similarity in question generation, while maintaining ≥4/5 LLM-based evaluation scores. Results collectively demonstrate CCG’s effectiveness, stability, and extensibility, offering a reusable methodological template for structured information collection in safety-critical legal-adjacent domains.
[1] Zhengyang Zhou, Yang Wang, Xike Xie, Lianliang Chen, and
Hengchang Liu. Riskoracle: A minute-level citywide traffic acci
dent forecasting framework. Proceedings of the AAAI Conference
on Artificial Intelligence, 34(01):1258–1265, Apr. 2020.
[2] Tianqi Wang, Sukmin Kim, Wenxuan Ji, Enze Xie, Chongjian Ge,
Junsong Chen, Zhenguo Li, and Ping Luo. Deepaccident: A motion
and accident prediction benchmark for v2x autonomous driving,
2023.
[3] Kebin Wu, Wenbin Li, and Xiaofei Xiao. Accidentgpt: Large multi
modal foundation model for traffic accident analysis, 2024.
[4] Jianwu Fang, Lei lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu,
Chen Lv, Jianru Xue, and Tat-Seng Chua. Abductive ego-view
accident video understanding for safe driving perception, 2024.
[5] Sungjae Lee and Yong-Gu Lee. Split liability assessment in car
accident using 3d convolutional neural network. Journal of Com
putational Design and Engineering, 10(4):1579–1601, 06 2023.
[6] Eleonora Papadimitriou, Ashleigh Filtness, Athanasios Theofilatos,
Apostolos Ziakopoulos, Claire Quigley, and George Yannis. Review
and ranking of crash risk factors related to the road infrastructure.
Accident Analysis & Prevention, 125:85–97, 2019.
[7] Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion
Androutsopoulos, Daniel Katz, and Nikolaos Aletras. LexGLUE: A
benchmark dataset for legal language understanding in English. In
Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, edi
tors, Proceedings of the 60th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), pages 4310
4330, Dublin, Ireland, May 2022. Association for Computational
Linguistics.
[8] Chidiogo Nwabuike. Giving cars eyes: Using computer vision to pre
vent road traffic accidents in nigeria. Proceedings of the AAAI Con
ference on Artificial Intelligence, 39(28):29593–29595, Apr. 2025.
[9] Artur Grigorev, Khaled Saleh, Yuming Ou, and Adriana-Simona
Mihaita. Enhancing traffic incident management with large lan
guage models: A hybrid machine learning approach for severity
classification, 2024.
[10] Noushin Behboudi, Sobhan Moosavi, and Rajiv Ramnath. Recent
advances in traffic accident analysis and prediction: A comprehen
sive review of machine learning techniques, 2024.
[11] Jinqi Lai, Wensheng Gan, Jiayang Wu, Zhenlian Qi, and Philip S.
Yu. Large language models in law: A survey. AI Open, 5:181–196,
2024.
[12] Farid Ariai and Gianluca Demartini. Natural language processing
for the legal domain: A survey of tasks, datasets, models, and chal
lenges, 2025.
[13] Jorge Martinez-Gil. A survey on legal question–answering systems.
Computer Science Review, 48:100552, 2023.
[14] Arian Askari, Suzan Verberne, and Gabriella Pasi. Expert finding
in legal community question answering, 2022.
[15] Arian Askari, Zihui Yang, Zhaochun Ren, and Suzan Verberne. An
swer retrieval in legal community question answering, 2024.
[16] Marius Büttner and Ivan Habernal. Answering legal questions from
laymen in German civil law system. In Yvette Graham and Matthew
Purver, editors, Proceedings of the 18th Conference of the European
Chapter of the Association for Computational Linguistics (Volume
1: Long Papers), pages 2015–2027, St. Julian’s, Malta, March 2024.
Association for Computational Linguistics.
[17] Yi Feng, Chuanyi Li, and Vincent Ng. Legal judgment prediction via
event extraction with constraints. In Smaranda Muresan, Preslav
Nakov, and Aline Villavicencio, editors, Proceedings of the 60th
Annual Meeting of the Association for Computational Linguistics
(Volume 1: Long Papers), pages 648–664, Dublin, Ireland, May
2022. Association for Computational Linguistics.
[18] Yi Feng, Chuanyi Li, and Vincent Ng. Legal case retrieval: A sur
vey of the state of the art. In Lun-Wei Ku, Andre Martins, and
Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting
of the Association for Computational Linguistics (Volume 1: Long
Papers), pages 6472–6485, Bangkok, Thailand, August 2024. Asso
ciation for Computational Linguistics.
[19] Ilias Chalkidis and Dimitrios Kampas. Deep learning in law: early
adaptation and legal word embeddings trained on large corpora.
Artificial Intelligence and Law, 27(2):171–198, 2019.
[20] Zhi Zhou, Jiang-Xin Shi, Peng-Xiao Song, Xiao-Wen Yang, Yi
Xuan Jin, Lan-Zhe Guo, and Yu-Feng Li. Lawgpt: A chinese legal
knowledge-enhanced large language model, 2024.
[21] Ha-Thanh Nguyen. A brief report on lawgpt 1.0: A virtual legal
assistant based on gpt-3, 2023.
[22] Jiaxi Cui, Munan Ning, Zongjian Li, Bohua Chen, Yang Yan, Hao
Li, Bin Ling, Yonghong Tian, and Li Yuan. Chatlaw: A multi
agent collaborative legal assistant with knowledge graph enhanced
mixture-of-experts large language model, 2024.
[23] Eason Chen, Niall Roche, Yuen-Hsien Tseng, Walter Hernandez,
Jiangbo Shangguan, and Alastair Moore. Conversion of legal agree
ments into smart legal contracts using nlp. In Companion Proceed
ings of the ACM Web Conference 2023, WWW ’23, page 1112
1118. ACM, April 2023.
[24] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu.
Bleu: a method for automatic evaluation of machine translation.
In Pierre Isabelle, Eugene Charniak, and Dekang Lin, editors, Pro
ceedings of the 40th Annual Meeting of the Association for Com
putational Linguistics, pages 311–318, Philadelphia, Pennsylvania,
USA, July 2002. Association for Computational Linguistics.
[25] Chin-Yew Lin and Eduard Hovy. Automatic evaluation of sum
maries using n-gram co-occurrence statistics. In Proceedings of
the 2003 Human Language Technology Conference of the North
American Chapter of the Association for Computational Linguis
tics, pages 150–157, 2003.
[26] Longxuan Ma, Ziyu Zhuang, Weinan Zhang, Mingda Li, and Ting
Liu. SelF-eval: Self-supervised fine-grained dialogue evaluation.
In Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James
Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi
Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Pag
gio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He,
Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon
Na, editors, Proceedings of the 29th International Conference on
Computational Linguistics, pages 485–495, Gyeongju, Republic of
Korea, October 2022. International Committee on Computational
Linguistics.
[27] Shikib Mehri and Maxine Eskenazi. USR: An unsupervised and ref
erence free evaluation metric for dialog generation. In Dan Jurafsky,
Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, Proceed
ings of the 58th Annual Meeting of the Association for Computa
tional Linguistics, pages 681–707, Online, July 2020. Association
for Computational Linguistics.
[28] Shikib Mehri and Maxine Eskenazi. Unsupervised evaluation of
interactive dialog with DialoGPT. In Olivier Pietquin, Smaranda
Muresan, Vivian Chen, Casey Kennington, David Vandyke, Nina
Dethlefs, Koji Inoue, Erik Ekstedt, and Stefan Ultes, editors, Pro
ceedings of the 21th Annual Meeting of the Special Interest Group
on Discourse and Dialogue, pages 225–235, 1st virtual meeting, July
2020. Association for Computational Linguistics.
[29] John Mendonca, Alon Lavie, and Isabel Trancoso. QualityAdapt:
an automatic dialogue quality estimation framework. In Oliver
Lemon, Dilek Hakkani-Tur, Junyi Jessy Li, Arash Ashrafzadeh,
Daniel Hernández Garcia, Malihe Alikhani, David Vandyke, and
Ondřej Dušek, editors, Proceedings of the 23rd Annual Meeting of
the Special Interest Group on Discourse and Dialogue, pages 83–90,
Edinburgh, UK, September 2022. Association for Computational
Linguistics.
[30] ChaeHun Park, Minseok Choi, Dohyun Lee, and Jaegul Choo.
Paireval: Open-domain dialogue evaluation with pairwise compari
son, 2024.
[31] Chen Zhang, Luis Fernando D’Haro, Qiquan Zhang, Thomas
Friedrichs, and Haizhou Li. FineD-eval: Fine-grained automatic
dialogue-level evaluation. In Yoav Goldberg, Zornitsa Kozareva,
and Yue Zhang, editors, Proceedings of the 2022 Conference on Em
pirical Methods in Natural Language Processing, pages 3336–3355,
Abu Dhabi, United Arab Emirates, December 2022. Association for
Computational Linguistics.
[32] Chen Zhang, Luis Fernando D’Haro, Chengguang Tang, Ke Shi,
Guohua Tang, and Haizhou Li. xdial-eval: A multilingual open
domain dialogue evaluation benchmark, 2023.
[33] Cheng-Han Chiang and Hung yi Lee. Can large language models
be an alternative to human evaluations?, 2023.
[34] Yen-Ting Lin and Yun-Nung Chen. Llm-eval: Unified multi
dimensional automatic evaluation for open-domain conversations
with large language models, 2023.
[35] Jinlan Fu, See-Kiong Ng, Zhengbao Jiang, and Pengfei Liu.
Gptscore: Evaluate as you desire, 2023.
[36] Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu,
and Chenguang Zhu. G-eval: Nlg evaluation using gpt-4 with better
human alignment, 2023.
[37] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet,
Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman
Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand
Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and
efficient foundation language models, 2023.
[38] Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu,
Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low
rank adaptation of large language models, 2021.