Why Mask R-CNN Routines are Changing AI
Detecting objects using computers, especially with environmental considerations can be difficult.
Many objects can overlap classification in varying conditions.
There’s a new algorithm routine called Mask R-CNN (Note: Wolf Blitzer was not harmed in its development) which is making good progress.
Here’s a tutorial on using Mask R-CNN for object classification:
In this tutorial, you will discover how to use the Mask R-CNN model to detect objects in new photographs.
After completing this tutorial, you will know:
- The region-based Convolutional Neural Network family of models for object detection and the most recent variation called Mask R-CNN.
- The best-of-breed open source library implementation of the Mask R-CNN for the Keras deep learning library.
- How to use a pre-trained Mask R-CNN to perform object localization and detection on new photographs.
Let’s get started.
This tutorial is divided into three parts; they are:
- R-CNN and Mask R-CNN
- Matterport Mask R-CNN Project
- Object Detection with Mask R-CNN
Mask R-CNN for Object Detection
Object detection is a computer vision task that involves both localizing one or more objects within an image and classifying each object in the image.
It is a challenging computer vision task that requires both successful object localization in order to locate and draw a bounding box around each object in an image, and object classification to predict the correct class of object that was localized.
An extension of object detection involves marking the specific pixels in the image that belong to each detected object instead of using coarse bounding boxes during object localization. This harder version of the problem is generally referred to as object segmentation or semantic segmentation.
The Region-Based Convolutional Neural Network, or R-CNN, is a family of convolutional neural network models designed for object detection, developed by Ross Girshick, et al.
There are perhaps four main variations of the approach, resulting in the current pinnacle called Mask R-CNN. The salient aspects of each variation can be summarized as follows:
- R-CNN: Bounding boxes are proposed by the “selective search” algorithm, each of which is stretched and features are extracted via a deep convolutional neural network, such as AlexNet, before a final set of object classifications are made with linear SVMs.
- Fast R-CNN: Simplified design with a single model, bounding boxes are still specified as input, but a region-of-interest pooling layer is used after the deep CNN to consolidate regions and the model predicts both class labels and regions of interest directly.
- Faster R-CNN: Addition of a Region Proposal Network that interprets features extracted from the deep CNN and learns to propose regions-of-interest directly.
- Mask R-CNN: Extension of Faster R-CNN that adds an output model for predicting a mask for each detected object.
The Mask R-CNN model introduced in the 2018 paper titled “Mask R-CNN” is the most recent variation of the family models and supports both object detection and object segmentation. The paper provides a nice summary of the model linage to that point:
The Region-based CNN (R-CNN) approach to bounding-box object detection is to attend to a manageable number of candidate object regions and evaluate convolutional networks independently on each RoI. R-CNN was extended to allow attending to RoIs on feature maps using RoIPool, leading to fast speed and better accuracy. Faster R-CNN advanced this stream by learning the attention mechanism with a Region Proposal Network (RPN). Faster R-CNN is flexible and robust to many follow-up improvements, and is the current leading framework in several benchmarks.
— Mask R-CNN, 2018.
The family of methods may be among the most effective for object detection, achieving then state-of-the-art results on computer vision benchmark datasets. Although accurate, the models can be slow when making a prediction as compared to alternate models such as YOLO that may be less accurate but are designed for real-time prediction.
Matterport Mask R-CNN Project
Mask R-CNN is a sophisticated model to implement, especially as compared to a simple or even state-of-the-art deep convolutional neural network model.
Source code is available for each version of the R-CNN model, provided in separate GitHub repositories with prototype models based on the Caffe deep learning framework. For example:
- R-CNN: Regions with Convolutional Neural Network Features, GitHub.
- Fast R-CNN, GitHub.
- Faster R-CNN Python Code, GitHub.
- Detectron, Facebook AI, GitHub.
Instead of developing an implementation of the R-CNN or Mask R-CNN model from scratch, we can use a reliable third-party implementation built on top of the Keras deep learning framework.
The best of breed third-party implementations of Mask R-CNN is the Mask R-CNN Project developed by Matterport. The project is open source released under a permissive license (i.e. MIT license) and the code has been widely used on a variety of projects and Kaggle competitions.
Nevertheless, it is an open source project, subject to the whims of the project developers. As such, I have a fork of the project available, just in case there are major changes to the API in the future.
The project is light on API documentation, although it does provide a number of examples in the form of Python Notebooks that you can use to understand how to use the library by example. Two notebooks that may be helpful to review are:
There are perhaps three main use cases for using the Mask R-CNN model with the Matterport library; they are:
- Object Detection Application: Use a pre-trained model for object detection on new images.
- New Model via Transfer Learning: Use a pre-trained model as a starting point in developing a model for a new object detection dataset.
- New Model from Scratch: Develop a new model from scratch for an object detection dataset.
In order to get familiar with the model and the library, we will look at the first example in the next section.
Object Detection With Mask R-CNN
In this section, we will use the Matterport Mask R-CNN library to perform object detection on arbitrary photographs.
Much like using a pre-trained deep CNN for image classification, e.g. such as VGG-16 trained on an ImageNet dataset, we can use a pre-trained Mask R-CNN model to detect objects in new photographs. In this case, we will use a Mask R-CNN trained on the MS COCO object detection problem.
Mask R-CNN Installation
The first step is to install the library.
At the time of writing, there is no distributed version of the library, so we have to install it manually. The good news is that this is very easy.
Installation involves cloning the GitHub repository and running the installation script on your workstation. If you are having trouble, see the installation instructions buried in the library’s readme file.
To start the tutorial click here.
This section provides more resources on the topic if you are looking to go deeper.
Rich feature hierarchies for accurate object detection and semantic segmentation, 2013.
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, 2014.
Fast R-CNN, 2015.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, 2016.
Mask R-CNN, 2017.
Mask R-CNN, GitHub.
Mask R-CNN Demo, Notebook.
Mask R-CNN – Inspect Trained Model, Notebook.
R-CNN Code Repositories
R-CNN: Regions with Convolutional Neural Network Features, GitHub.
Fast R-CNN, GitHub.
Faster R-CNN Python Code, GitHub.
Detectron, Facebook AI, GitHub.
In this tutorial, you discovered how to use the Mask R-CNN model to detect objects in new photographs.
Specifically, you learned:
The region-based Convolutional Neural Network family of models for object detection and the most recent variation called Mask R-CNN.
The best-of-breed open source library implementation of the Mask R-CNN for the Keras deep learning library.
How to use a pre-trained Mask R-CNN to perform object localization and detection on new photographs.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.