Belief Revision based Caption Re-ranker with Visual Semantic Information |
Ahmed Sabir, Francesc Moreno-Noguer, Pranava Madhyastha , Lluís Padró |
|
|
|
In this work, we focus on improving the captions generated by image-caption generation systems. We propose a novel re-ranking approach that leverages visual-semantic measures to identify the ideal caption that maximally captures the visual information in the image. Our re-ranker utilizes the Belief Revision framework (Blok et. al. 2003) to calibrate the original likelihood of the top-n captions by explicitly exploiting the semantic relatedness between the depicted caption and the visual context. Our experiments demonstrate the utility of our approach, where we observe that our re-ranker can enhance the performance of a typical image-captioning system without the necessity of any additional training or fine-tuning.
The Belief revision is
a conditional probability model which assumes that the preliminary probability finding is revised to the extent warranted by the hypothesis proof.
|
ExampleIn this example, we extract top-20 beam search from SOTA caption transformer and re-ranked them with Visual Belief Revision. a longhorn cow with horns standing in a field two bulls standing next to each other two bulls with horns standing next to each other two bulls with horns standing next to each other two bulls with horns standing next to each other two bulls with horns standing next to each other two bulls with horns standing next to each other two bulls with horns standing next to each other two bulls with horns standing next to each other two bulls with horns standing next to each other a couple of bulls standing next to each other a couple of bulls standing next to each other two long horn bulls standing next to each other two long horn bulls standing next to each other two long horn bulls standing next to each other two long horn bulls standing next to each other two long horn bulls standing next to each other two long horn bulls standing next to each other two long horn bulls standing next to each other two long horn bulls standing next to each other |
Visual ContextResNet/CLIP
COCO_val2014_000000235692.jpg [('ox', 0.49095494)] |
Visual Belief Revisionre-ranking
two bulls standing next to each other 0.31941289259462063 a couple of bulls standing next to each other 0.2858426977047663 two bulls with horns standing next to each other 0.26350009525262974 two long horn bulls standing next to each other 0.24074783064577798 a longhorn cow with horns standing in a field 0.0.03975113398536263 |
@article{sabir2022belief, title={Belief Revision based Caption Re-ranker with Visual Semantic Information}, author={Sabir, Ahmed and Moreno-Noguer, Francesc and Madhyastha, Pranava and Padr{\'o}, Llu{\'\i}s}, journal={arXiv preprint arXiv:2209.08163}, year={2022} }
Contact: Ahmed Sabir