Introduce Model Explainability to Computer Vision Models With Grad-CAM

4 min readMay 20, 2022

Introduction

This is a code walkthrough on how to implement Grad-CAM visualizations to computer vision models. I have considered the problem of classifying images of dogs and cats for this walktrough. The dataset has been taken from Kaggle which contains training and validation folders containing images of dogs and cats separately.

We’ll start with importing the dataset, the applying preprocessing, training a classifier and then making a prediction.

Finally, we will go through the concept of Grad-CAM and see it in action. Because training the model for better accuracy is one thing and it is much better to know if the model is actually making those predictions based on the correct portion in an image.

Training a classifier

First, we perform all necessary imports and load the image dataset using Tensorflow ImageDataGenerator and also set rescaling factor as 1/255 to both training and validation set. Also, apply horizontal flip on the training set. Batch Size is set to 16 and every Image is resized to 224x224.

Next step, we create our CNN classifier which will be trained to classify cats and dogs. Here, I’ve used transfer learning and I have considered ResNet-50 which is pre-trained with the ImageNet weights. Note here that I’ve removed the top layer from the ResNet-50 model and attached a new classification layer. Below is the code for creating and execute training operation.

After training is complete, checking the model performance for accuracy and it shows above 90% as shown in graph below.

Making Predictions

Now, let’s load the save weights and make a prediction on unseen images. Below code picks a random image from the validation set and the model makes a prediction on the same random image. We carry on same pre-processing steps (rescaling & resizing) on the image as it was done during training.

Following is the output of predictions made by the model.

About Grad-CAM

Gradient-weighted Class Activation Mapping (Grad-CAM) is a technique for constructing actions taken by a multitude of CNN-based models. It generates a coarse localization map depending on the information passing into the deep learning model’s last convolution layer.

Generate Grad-CAM Visualizations

Final step is to generate heat-map using the Grad-CAM technique. Following steps needs to be performed for the visualizations.

We build a model that converts the signal image to the activation of the last convolution layer and the output predictions. Also, Discard the last layer’s activation functions.
Calculate the gradient of the top predicted class for our input image in relation to the last conv layer activations.
We multiply each channel in the feature map array by significance to the top predicted class, then add the results to get the heatmap class activation.
Normalize the heatmap array values between 0 & 1
Finally, we super-impose the generated heatmap onto input image.

Let’s see this in code. FYI the code for generating the visualizations has been taken from here. Note that you might need to change the value of last_conv_layer_name variable in line 86 according to the model.

On successful run, the output should look like this.

We can clearly see that the model has done a pretty good job in understanding the localization of the main characteristics and able to differentiate properly between a cat vs a dog.

Conclusion

Hope this article has given you the right idea of how can we introduce model explainability to computer vision models.

Now, this is not limited to cats & dogs there are more real world applications to this and I would encourage you to explore more about new methods and ways in which we can bring more transparency to a black box model.

Complete notebook for this article can be accessed here.

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com