Women Wearing Lipstick: Measuring the Bias Between Object and Its Related Gender

Gender Score Estimation_MASK

      a  [MASK]  playing a video game in a living room (object bias based prediction:man)

Experiments & Results

Comparison result (i.e. baseline gender output) between Object Gender Co-Occ and our Gender Score Estimation on the Karpathy split. The proposed score measures gender bias more accurately, particularly when there is a strong gender-to-object relation.

Qualitative results

The proposed score uses the correlation between the visual and its related gender. As shown in the (Left) figure, there is an equal distribution of object-gender (man and woman), which indicate that not all object has a strong bias toward a specific gender. (Right) the figure shows examples of Gender Score Estimation and Gender Object Distance via Cosine Distance. The result shows that (Top) the score balances the bias (as men and women have a similar bias with sport tennis), (Bottom) men strong object bias relation (paddle, surfboard), the model adjusts the women bias while preserving the object gender bias.

Examples

The Table below shows that our score has similar results (bias ratio) to the existing Object Gender Co-Occ approach on the most biased objects toward men. Note that TraCLIPS-Reward (CLIPS+CIDEr) inherits biases from RL-CLIPS, resulting in distinct gender predictions and generates caption w/o a specific gender i.e. person, baseball player, etc.

Comparison against GPT-2 and Cosine Distance Score

Comparison result on the test set of the Gender Score bias. (toward women ■ or men ■) between two different pre-trained models in training dataset size BLIP 129M (unsupervised) and VilBERT 3.5M. Our proposed visual bias likelihood revision aka Belief Revision (BR) based Gender Score ■ balances the amplified bias as the model gets bigger, the more amplified bias against men or women.

Case study

We also apply our proposed gender score to general tasks such as short text Twitter, we utilized a subset of the Twitter user gender classification dataset. We use a BERT based keyword extractor to extract the biased context from the sentence (e.g. travel-man, woman-family), and we then employ the cloze probability to extract the probability of the context. We observe that some keywords have a strong bias: women are associated with keywords such as novel, beauty, and hometown. Meanwhile, men are more frequently related to words such as gaming, coffee, and inspiration. The table below shows (highlighted in red color) when Gender Score disagrees with human estimation confidence due to gender-context bias.