How Aunt-Like Are You? Exploring Gender Bias in the Genderless Estonian Language: A Case Study

Elisabeth Kaukonen1, Ahmed Sabir2, Rajesh Sharma2
1 University of Tartu, Institute of Estonian and General Linguistics, Estonia
2 University of Tartu, Institute of Computer Science, Estonia

Abstract

This paper examines gender bias in Estonian, a grammatically genderless Finno-Ugric language, which doesn't have gendered noun system nor any gendered pronouns, but expresses gender through vocabulary. In this work, we focus on the male-female compound words ending with -tädi ‘aunt’ and -onu ‘uncle’, aiming to pinpoint the occupations these words signify for women and men, and to examine whether they reveal occupational differentiation and gender stereotypes. The findings indicate that these compounds go beyond occupational titles and highlight prevalent gender bias.

Statistical Evaluation

The result shows that the female compound words ending with Tädi in occupational titles primarily marked professions related to customer service (44% from all occupational titles), healthcare (13%), and social work (11%), while onu in occupational titles predominantly represented law enforcement (20%), followed by healthcare (12%) and customer service (7%). Thus, women are more often associated with occupations related to children, teaching, and (elder) care, while men are often found in the role of guards and police officers.



LLMs Evaluation

Inspired by the human-written CrowS-Pairs dataset (Nangia et al., 2020), which uses sentence pairs to highlight stereotypes across social categories, we manually created sentence pairs using the same Estonian National Corpus. The analysis of LLMs revealed that these models propagate occupational biases related to the compound words tädi and onu. Specifically, the fine-tuned Estonian LLAMA-2-7B (LLAMMAS) model reflects biases from Estonian labor force statistics more accurately than the similar-sized LLAMA-3-8B and the larger LLAMA-3-70B models. This indicates that the process of fine-tuning has amplified the inherent biases within the model.

Description of image

Examples of occupational title bias using the fine-tuned Estonian LLAMA (LLAMMAS) and the off-the-shelf LLAMA-3-70B models. (Top) The example demonstrates how the models measure gender bias, associating bakery tasks with women. (Bottom) in the example with the cleaning [aunt/uncle] occupational title, the standard LLAMA-70B incorrectly reflects the female-biased occupation.

Description of image

Acknowledgment

This work has received funding from the EU H2020 program under the SoBigData++ project (grant agreement No. 871042), by the CHIST-ERA grant No. CHIST-ERA-19-XAI-010, (ETAg grant No.SLTAT21096), and partially funded by HAMISON project.


Citation

@article{kaukonen2025aunt,
  title={How Aunt-Like Are You? Exploring Gender Bias
    in the Genderless Estonian Language: A Case Study},
  author={Kaukonen, Elisabeth and Sabir, Ahmed and Sharma, Rajesh},
  journal={arXiv preprint arXiv:2500.00000},
  year={2025}
 }  

Contact: Elisabeth Kaukonen