Affordable Access

Publisher Website

Over-CAM : Gradient-Based Localization and Spatial Attention for Confidence Measure in Fine-Grained Recognition using Deep Neural Networks

  • Kantor, Charles
  • Rauby, Brice
  • Boussioux, Léonard
  • Jehanno, Emmanuel
  • Talbot, Hugues
Publication Date
Oct 20, 2020
DOI: 10.1109/ICCV.2017.322
OAI: oai:HAL:hal-02974521v1
External links


While the basic methods of image classification are bench-marked on large databases of widely varying objects , many AI real-world applications require advanced, fine-grained classification, to distinguish between items with similar global patterns, like insects or birds, but that differ by small details. Therefore, we propose in this paper to use attention and segmentation methods to distinguish foreground from background, and to use this as a confidence measure, based on the overlap between the segmentation and the attention masks. We show that confidence in the classification grows as this overlap increases. This confidence and identification tools are of practical interest in biology for automated wildlife recognition and we focus on the butterfly classification. Our tool is currently deployed in real-world on widely used crowdsourcing platforms and museums to annotate large scale data efficiently and engage citizen scientists. Highly Imbalanced Distribution Our dataset of butterflies photos is organized hierarchically. Each image has three labels: a family, a genus, and a species. A species belongs to one and only one genus, which also belongs to one and only one family. Classes are highly imbalanced and butterfly species can be very similar, which makes it a fine-grained classification task. Gradient-Weighted Class Activation Mapping As insects are generally photographed in a detail-rich environment , we initially implemented Grad-CAM to show using reverse gradient propagation, that our CNNs are likely to pay attention to the background instead of focusing on the butterfly. Since the butterflies are our point of interest, this misleading information biases the model. Therefore, segmenting the butterfly on the image and feeding the masked picture of the insect as a prior to the network could improve performance. Thus, we built an automatic segmentation algorithm to remove or simplify the background. we used a network that was explicitly implementing an attention mechanism. For this reason, we used an attention model based on the generic implementation of CBAM [Woo et al. 2018]. This architecture was chosen for its great classification results on several benchmarks but also because it separates the spatial attention mask from the channel attention. Automated Segmentation We thus propose to use seg-mentation tools for improving our fine-grained classification task. To generate the segmentation masks, we used a Mask R-CNN [He et al. 2017] network pre-trained on COCO instance segmentation dataset [Lin et al. 2014] and fine-tuned it on a small subset of the dataset. This approach is possible because the butterfly segmentation task is very similar to the task of segmenting other objects present in a common dataset and therefore the pre-training was very efficient. We annotated the small subset used for pre-training ourselves and we qualitatively assessed the segmentation performance. The segmentation results obtained were good enough to be used in the guided attention context and for uncertainty prediction.

Report this publication


Seen <100 times