Object Detection is a technique used in Computer Vision that aims at localizing and classifying objects in images or video. The main idea is that of building bounding boxes that envelop a specific object and then apply a classification algorithm (in the field of Computer Vision it will belong to the family of CNN) to the bounded object, in order to classify it with a probabilistic output.

So as you can see, there are two main steps in an object detection task: building the bounding boxes and classifying objects within them. In this article, we are going to focus on the first step, having a look at a very popular method for building boxes and implementing it in Python.

Selective Search

Selective Search was first introduced in 2012 in the paper of Uijlings et Al (you can find it here). According to the authors, this method combines the strength of both exhaustive search and segmentation:

  • exhaustive search → aims at finding all the possible locations by systematically enumerating all the possible candidates.
  • segmentation → uses the image structure to cluster pixels into different regions

Selective search captures both these features, yet with some additional benefits:

  • It captures all scales, addressing the fact that objects in images can present at different sizes and orientations. This step is based on the intuition of the hierarchical structure of images. The creation of initial regions is done via a graph-based greedy algorithm, which starts grouping the most similar regions until the whole image becomes a unique region.
  • It diversifies the grouping of regions according to different metrics. Indeed, there is no optimal strategy to group regions together: sometimes might be because of the color, sometimes because of the texture, and so forth. Specifically, the authors introduced four different diversification strategies based on color, texture, size and fill. For each feature, a similarity score (between each couple of regions) is computed. The final similarity score is nothing but a linear combination of the above four similarity scores.
  • It is computationally fast, differently from its predecessor exhaustive search.

Great, now that we have an idea of the reasoning behind it, let's see an implementation with Python.

Implementation with Python

For this purpose, I'm going to use the Python library selective-search, which allows full implementation of this algorithm. You can easily install it via pip. Plus, in order to easily get some images, I will also use the skimage library, a very useful one that you can find here.

First thing first, let's import all the packager we are going to need.

! pip install selective-search
import skimage
import selective_search
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

Great, now let's import an image and apply the selective search algorithm on it:

# Load image
image = skimage.data.astronaut()
fig, ax = plt.subplots(figsize=(6, 6))
ax.imshow(image)
None
# Propose boxes using selective search
boxes = selective_search.selective_search(image, mode='fast')

Before having a look at our results, let's first examine the parameter mode. It refers to the various combinations of diversification strategies proposed in the paper. More specifically, three strategies are available for this purpose:

None
Source: https://pypi.org/project/selective-search/

The color spaces identify the domain where the algorithm is performed. The naming convention is explained in detail in the original paper at chapter 3.2. The similarity measures indicate which combinations of the different similarity measures are considered. Namely, the combination CTSF means that we are considering all Color, Texture, Size and Fill similarity measures in the final linear combinations. Finally, with Starting Regions the authors refer to the initial grouping algorithm.

Now let's see the result:

boxes_filter = selective_search.box_filter(boxes, min_size=20, topN=80)
# drawing rectangles on the original image
fig, ax = plt.subplots(figsize=(6, 6))
ax.imshow(image)
for x1, y1, x2, y2 in boxes_filter:
    bbox = mpatches.Rectangle(
        (x1, y1), (x2-x1), (y2-y1), fill=False, edgecolor='green', linewidth=1)
    ax.add_patch(bbox)
plt.axis('off')
plt.show()
None

As you can see, the algorithm created several boxes. Of course, the way it did so can be modified by changing the mode parameter. Namely, let's see a comparison between the three:

None

Selective search is a powerful technique that is widely used in popular object detection algorithms, like within the family of Region-Based CNNs (R-CNN, Fast R-CNN and Faster R-CNN).

References