跳到主要內容

簡易檢索 / 詳目顯示

研究生: 陳泓嘉
Evan Chen
論文名稱: Assessing Implicit Gender and Racial Biases Towards Professions in Large Language Models: An Empirical Investigation
指導教授: 陳弘軒
Hung-Hsuan Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 資訊電機學院 - 資訊工程學系
Department of Computer Science & Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 53
中文關鍵詞: 大型語言模型性別偏見人工智慧公平性性別代表性人工智慧對齊
外文關鍵詞: LLM, Gender Bias, AI fairness, Gender Representation, Human-AI alignment
相關次數: 點閱:19下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究提出了一套新穎的評估框架,旨在揭示大型語言模型(LLMs)中,針對不同職業所存在的內隱性別與種族偏見。此方法有別於以往採用結構化情境或特定提示詞(prompt)的作法,我們轉而利用「自由形式的故事創作」:我們提示大型語言模型為各種職業生成角色,接著分析文本中的性別線索(如:姓名、代名詞),並從姓名推斷其種族歸屬。
    我們對十個主流大型語言模型進行系統性分析後,發現普遍存在一個現象:在許多職業中,女性角色的比例被過度呈現,這很可能是受到「人類回饋強化學習」(RLHF)等「對齊」(alignment)工作的影響。與此同時,我們也發現模型產出的內容中,職業普遍與被認定為白人的姓名產生連結。
    此外,大型語言模型所生成的職業性別分佈,比起真實世界的勞動統計數據,更接近於人類社會的刻板印象。這些研究結果凸顯出,我們在實施平衡的偏見緩解措施時所面臨的挑戰與其重要性,以促進公平,並防止現有或新興的性別與種族社會偏見被再次強化或建立。


    This study introduces a novel evaluation framework, distinct from prior methods that mostly use prompts in structured scenarios, to uncover implicit gender and racial biases in large language models (LLMs) regarding professions. Our approach leverages free-form storytelling: we prompt LLMs to generate characters for various occupations, then analyze gender cues (e.g., names, pronouns) and infer racial associations from these names. A systematic analysis of ten prominent LLMs reveals a consistent overrepresentation of female characters in most occupations, likely influenced by alignment efforts such as RLHF. Despite this, when ranking the occupations from most female-associated to most male-associated based on the models' outputs, the resulting order aligns more closely with human stereotypes than with real-world labor statistics. In addition, we find a predominant association of professions with white-identifying names in the LLM outputs. These findings highlight the challenge and importance of implementing balanced mitigation measures to promote fairness and prevent the reinforcement of existing or the establishment of new societal biases related to gender and race.

    摘要 v Abstract vi Acknowledgement vii Contents viii Glossary of Symbols xiv 1 Introduction 1 2 Related Work 3 2.1 Evaluating Bias via LLM Decisions in Designed Scenarios . . . . . . . . . . . . . . . . . . 3 2.1.1 Bias in Employment and Hiring Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.2 Bias Revealed Through Persona and Role-Playing . . . . . . . . . . . . . . . . . . . 4 2.1.3 Bias in LLM-Generated Advice and Recommendations . . . . . . . . . . . . . . 4 Evaluating Bias via Linguistic Analysis of LLM Outputs to Specific Prompts 2.2 5 2.3 Our Approach: Analyzing Demographic Representation in Open-Ended Story Generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Evaluation Method 7 3.1 Overall Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 3.3 Evaluating Evaluating Gender Stereotypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Racial Stereotypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3.1 Methodological Note on the Hispanic Category . . . . . . . . . . . . . . . . . . . . . . 11 4 Discovery 12 4.1 A Multi-Faceted Evaluation Framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1.1 Average Deviations (Avg Dev): Quantifying the Direction and Av- erage Magnitude of Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Average Manhattan Distance (Manh.): Quantifying Total Absolute Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 CONTENTS 4.1.3 Kendall’s Tau Rank Correlation: Assessing Ordinal Consistency . . . . 14 4.1.4 Average KL Divergence: Measuring the Severity of Implausible Out- comes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.1.5 Average Cosine Distance (Cos Dis): Evaluating Structural Pattern Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 A Paradox of Overcorrection: Female Overrepresentation Coexists with Stereotypical Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2.1 Pervasive Overrepresentation of Female Characters in Modern LLMs 16 4.2.2 man 4.2.3 LLM Rankings of Occupational Gender Ratios Align Closer to Hu- Rating. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Female Overrepresentation Stems from a Superficial Alignment Cor- rection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 A Multi-Faceted Failure: Pervasive ”White Default” and a Lack of Real- World Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Analysis Across Uniform, Societal, and Professional Bench- marks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 18 23 23 4.3.2 Divergent Failures: A Comparative Analysis of Racial Bias Profiles . 24 4.3.3 Synthesizing the Results: The Distance from Equity. . . . . . . . 25 5 Conclusion 32 5.1 The Perils of Superficial Alignment: Gender’s Double-Edged Sword . . . . . . . . 32 5.2 The ”White Default” and the Failure of Contextual Knowledge. . . . . . . . . . . . . 32 5.3 The Asymmetric Failure of Bias: An Inverse Relationship in Representa- tion and Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.4 Implications for Model Development and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 34 5.5 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.5.1 Limitations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6 Bibliography 36 A Github 39

    [1] [2] [3] [4] [5] [6] [7] H. Nghiem, J. Prindle, J. Zhao, and H. Daumé Iii, ““you gotta be a doctor, lin” : An inves-
    tigation of name-based bias of large language models in employment recommendations,” in
    Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,
    Y. Al-Onaizan, M. Bansal, and Y.-N. Chen, Eds., Miami, Florida, USA: Association for
    Computational Linguistics, Nov. 2024, pp. 7268–7287. doi: 10.18653/v1/2024.emnlp-
    main.413. [Online]. Available: https://aclanthology.org/2024.emnlp-main.413/.
    H. An, C. Acquaye, C. Wang, Z. Li, and R. Rudinger, “Do large language models discrim-
    inate in hiring decisions on the basis of race, ethnicity, and gender?” In Proceedings of the
    62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short
    Papers), L.-W. Ku, A. Martins, and V. Srikumar, Eds., Bangkok, Thailand: Association
    for Computational Linguistics, Aug. 2024, pp. 386–397. doi: 10.18653/v1/2024.acl-
    short.37. [Online]. Available: https://aclanthology.org/2024.acl-short.37.
    H. Kotek, R. Dockum, and D. Sun, “Gender bias and stereotypes in large language mod-
    els,” in Proceedings of The ACM Collective Intelligence Conference, ser. CI ’23, Delft,
    Netherlands: Association for Computing Machinery, 2023, pp. 12–24, isbn: 9798400701139.
    doi: 10 . 1145 / 3582269 . 3615599. [Online]. Available: https : / / doi . org / 10 . 1145 /
    3582269.3615599.
    J. An, D. Huang, C. Lin, and M. Tai, Measuring gender and racial biases in large language
    models, 2024. arXiv: 2403.15281 [econ.GN]. [Online]. Available: https://arxiv.org/
    abs/2403.15281.
    H. Iso, P. Pezeshkpour, N. Bhutani, and E. Hruschka, Evaluating bias in llms for job-
    resume matching: Gender, race, and education, 2025. arXiv: 2503.19182 [cs.CL]. [On-
    line]. Available: https://arxiv.org/abs/2503.19182.
    J. Shin, H. Song, H. Lee, S. Jeong, and J. C. Park, Ask llms directly, ”what shapes your
    bias?”: Measuring social bias in large language models, 2024. arXiv: 2406.04064 [cs.CL].
    [Online]. Available: https://arxiv.org/abs/2406.04064.
    K.-M. Robinson, V. Turri, C. J. Smith, and S. K. Gallagher, “Tales from the wild west:
    Crafting scenarios to audit bias in llms,” in 1st HEAL Workshop at CHI Conference on
    Human Factors in Computing Systems, Honolulu, HI, USA, May 2024. [Online]. Available:
    [8] [9] [10] [11] [12] [13] [14] [15] https://heal- workshop.github.io/chi2024_papers/24_tales_from_the_wild_
    west_craft.pdf.
    A. Salinas, A. Haim, and J. Nyarko, What’s in a name? auditing large language models
    for race and gender bias, 2025. arXiv: 2402.14875 [cs.CL]. [Online]. Available: https:
    //arxiv.org/abs/2402.14875.
    A. Salinas, P. Shah, Y. Huang, R. McCormack, and F. Morstatter, “The unequal oppor-
    tunities of large language models: Examining demographic biases in job recommendations
    by chatgpt and llama,” in Proceedings of the 3rd ACM Conference on Equity and Access
    in Algorithms, Mechanisms, and Optimization, ser. EAAMO ’23, Boston, MA, USA: As-
    sociation for Computing Machinery, 2023, isbn: 9798400703812. doi: 10.1145/3617694.
    3623257. [Online]. Available: https://doi.org/10.1145/3617694.3623257.
    Y. Wan, G. Pu, J. Sun, A. Garimella, K.-W. Chang, and N. Peng, ““kelly is a warm
    person, joseph is a role model”: Gender biases in LLM-generated reference letters,” in
    Findings of the Association for Computational Linguistics: EMNLP 2023, H. Bouamor,
    J. Pino, and K. Bali, Eds., Singapore: Association for Computational Linguistics, Dec.
    2023, pp. 3730–3748. doi: 10.18653/v1/2023.findings-emnlp.243. [Online]. Available:
    https://aclanthology.org/2023.findings-emnlp.243.
    S. Lee, D. Kim, D. Jung, C. Park, and H. Lim, “Exploring inherent biases in LLMs within
    Korean social context: A comparative analysis of ChatGPT and GPT-4,” in Proceedings of
    the 2024 Conference of the North American Chapter of the Association for Computational
    Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), Y. (
    Cao, I. Papadimitriou, A. Ovalle, M. Zampieri, F. Ferraro, and S. Swayamdipta, Eds.,
    Mexico City, Mexico: Association for Computational Linguistics, Jun. 2024, pp. 93–104.
    doi: 10.18653/v1/2024.naacl- srw.11. [Online]. Available: https://aclanthology.
    org/2024.naacl-srw.11/.
    S. Soundararajan and S. J. Delany, “Investigating gender bias in large language models
    through text generation,” in Proceedings of the 7th International Conference on Natu-
    ral Language and Speech Processing (ICNLSP 2024), M. Abbas and A. A. Freihat, Eds.,
    Trento: Association for Computational Linguistics, Oct. 2024, pp. 410–424. [Online]. Avail-
    able: https://aclanthology.org/2024.icnlsp-1.42/.
    S. H. Kumar, S. Sahay, S. Mazumder, et al., Decoding biases: Automated methods and llm
    judges for gender bias detection in language models, 2024. arXiv: 2408.03907 [cs.CL].
    [Online]. Available: https://arxiv.org/abs/2408.03907.
    T. Bas, Assessing gender bias in llms: Comparing llm outputs with human perceptions
    and official statistics, 2024. arXiv: 2411 . 13738 [cs.CL]. [Online]. Available: https :
    //arxiv.org/abs/2411.13738.
    Meta AI, Llama 3.2: Revolutionizing edge ai and vision with open, customizable models,
    2024. [Online]. Available: https://ai.meta.com/blog/llama- 3- 2- connect- 2024-
    vision-edge-mobile-devices/.
    [16] [17] [18] [19] G. T. Google, M. Riviere, S. Pathak, et al., Gemma 2: Improving open language models at
    a practical size, 2024. arXiv: 2408.00118 [cs.CL]. [Online]. Available: https://arxiv.
    org/abs/2408.00118.
    OpenAI, A. Hurst, A. Lerer, et al., Gpt-4o system card, 2024. arXiv: 2410.21276 [cs.CL].
    [Online]. Available: https://arxiv.org/abs/2410.21276.
    G. T. Google, P. Georgiev, V. I. Lei, et al., Gemini 1.5: Unlocking multimodal under-
    standing across millions of tokens of context, 2024. arXiv: 2403.05530 [cs.CL]. [Online].
    Available: https://arxiv.org/abs/2403.05530.
    K. AlNuaimi, G. Marti, M. Ravaut, A. AlKetbi, A. Henschel, and R. Jaradat, Enriching
    datasets with demographics through large language models: What’s in a name? 2024. arXiv:
    2409.11491 [cs.CL]. [Online]. Available: https://arxiv.org/abs/2409.11491.

    QR CODE
    :::