from fastai2.vision.all import *

Notebook Overview

I'm going to create an image recognition model that predicts if a photo contains my pet Poe or not. The impetus for this post stems from the following:

  • I've been curious about how 'easy' it is to train a CNN that can distinguish individuals; I know google does it with my phone but can I do it with fast.ai and a few hundred photos that aren't individual specific?

  • I'd like to get a better handle on structuring a computer vision project with my own data and the fast.ai library.

and finally

  • I want to create my first post on the fastpages section of my blog. :-)

SOTA: Dog or Cat

fast.ai makes training state of the art (SOTA) models a breeze with their high level API; take the cat classifier below:

from fastai2.vision.all import *

path = untar_data(URLs.PETS)/'images'

def is_cat(x): return x[0].isupper()
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=is_cat, item_tfms=Resize(224))

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
epoch train_loss valid_loss error_rate time
0 0.169193 0.020039 0.006089 00:19
epoch train_loss valid_loss error_rate time
0 0.051610 0.011121 0.004736 00:26

With the 8 lines of code above we're able to get a near perfect binary classifier for predicting whether an image is a cat or not. You can assess the accuracy with the error_rate and if you think that's amazing, let's take a peak at what the model got wrong.

As you can see from the confusion matrix above and a subset of the most incorrect images (ranked by the loss value), that's a prettttty good classifier we just made, making some reasonable mistakes for a computer...

Understanding the magic

You might think that the small code snippet above was pure black magic but what's really going on is a ton of abstraction has been hidden away in layers of code. Roughly speaking, the steps that have been abstracted are the following:

  • Downloading data from an external URL
  • Creating objects which split the data into train and validation sets (dls i.e. data loaders)
    • These objects also handle batching data for stochastic gradient descent (SGD).
  • Providing a mechanism for labeling (that indiciates what the model should be fitting to)
  • Defining transformations to the individual data points (photos) before they get batched (item_tfms), specifically resizing them to be all the same size (so we can use them on the GPU)
  • Instantiating a PyTorch model with pretrained weights (resnet34)
    • Removing the last few layer of resnet34 and attaching new layers of our own (with some random initialization)
  • Assigning a loss function (using the default here which is inferred from the dls objects)
  • Setting a metric for monitoring performance
  • Using SGD to minimize our loss function

    • Training for an epoch where we only updated the new layers we put on top resenet34
    • Then we training for another epoch where all weights could be updated and
  • Finally, utility functions for inspecting our results

While abstraction is really necessary for creating good readable code, it doesn't necessarily help us when we start to try and apply these tools to our own projects. So here's where we're going to start digging into the code and expand our understanding of what's happening under the hood by leveraging our own dataset and inspecting things as we go.

The data: Path objects, labels, and directory structures

The first thing we need to do for our classifier is firgure out how to structure our directories, how to label it and then pass that information into our DataLoaders.

Ignoring the fact that untar_data does some downloading magic, let's checkout what it returns:

path = untar_data(URLs.PETS)/'images';path
Path('/home/bibsian/.fastai/data/oxford-iiit-pet/images')
isinstance(path, Path)
True

So turns out that untar_data passes back a Path object which contains our labeled data for the cat classifier. From the code in is_cat it looks like all cat photos should have a capitalized name; let's test that:

# Let's inspect the repo
! ls $path | head -5 
Abyssinian_100.jpg
Abyssinian_100.mat
Abyssinian_101.jpg
Abyssinian_101.mat
Abyssinian_102.jpg
ls: write error: Broken pipe

Looks like we can grab the first image and see if it's a cat (here's some bash-python for you :-o)

cat_file = ! ls $path | head -1 
def show_img(path):
     return PILImage.create(path).show()
# The beauty of Path objects is you can just concatente inline with backslashes 
# (not sure if this works on windows though)
show_img(path/cat_file[0])
<matplotlib.axes._subplots.AxesSubplot at 0x7f9f08b52790>

So this looks like a cat... cool! For good measure let's look at a dog.

! ls $path | tail -5 # dogs dont
yorkshire_terrier_96.jpg
yorkshire_terrier_97.jpg
yorkshire_terrier_98.jpg
yorkshire_terrier_99.jpg
yorkshire_terrier_9.jpg
dog_file = ! ls $path | tail -1 
show_img(path/dog_file[0])
<matplotlib.axes._subplots.AxesSubplot at 0x7f9f17eed850>

And there we go; we've validated the labeling methodology.

So looks like we can throw all of our photos in 1 directory, label how ever we want to distinguish the classes and make sure we pass a function to our ImageDataLoaders that knowns how to parse it; pretty neat.

My data

Okay so here's where I go on a tanget about my data as well as providing the code to organize it.

The source: I downloaded all my google photos that were tagged with my dog, Poe. I manually inspected each of the photos (a little more than 1000) and made sure you could see his face in the photo; let's check one out... note, not all shots are nearly as good of a close up (a lot are from far away with other things in the frame):

poe_path = Path("/home/bibsian/Downloads/Poe")
poe_img = ! ls ~/Downloads/Poe | head -20
show_img(poe_path/poe_img[4])
<matplotlib.axes._subplots.AxesSubplot at 0x7f9ef573d5d0>

And there's his big ole block head :-). If you're guessing what kind of dog he is my best guess is that he's a springer spaniel pitbull mix; google "brown spinger spaniel" if you want to get dogs that looks like him. I'll use those search terms to source negative sample images to use during training via Bing's Image Search API.

Since my Poe folder has around 1000 images and is about 3GB worth of data I'm going to start training a model with a random subset. Here's what the next bit of code will do:

  • Create a folder called poe_data
  • Transfer a random subset of images and rename them with poe_<numerical_index>

and

  • Download images of dogs that could look like him (i.e. use Bing API with "brown springer spaniels"), put them in poe_data, and label them Spaniel_<numerical_index>
import numpy as np
from typing import *
N_SUBSET = 150

poe_classifier_path = Path("/home/bibsian/Desktop/poe_data")
n_poe_photos = len(poe_path.ls()); print(n_poe_photos)
989
poe_unlabeled_files = poe_path.ls()[np.random.randint(0, n_poe_photos, N_SUBSET)]
if not poe_classifier_path.exists():
    print("Creating source directory for Poe Classifier")
    poe_classifier_path.mkdir()
def move_files_and_rename_with_base_index(source_files: List[Path], 
                                          dest_dir: Path, 
                                          base_name: str):
    """ 
    Move files and rename with a base string appended with an index
    i.e. <basestr>_0, <basestr>_1
    """
    for ix, og_file in enumerate(source_files):
        with (dest_dir/f"{base_name}_{ix}{og_file.suffix}").open(mode="xb") as file_id:
            file_id.write(og_file.read_bytes())
if not poe_classifier_path.ls():
    print("Moving a subset of unlabeled Poe images")

    move_files_and_rename_with_base_index(source_files=poe_unlabeled_files, 
                                          dest_dir=poe_classifier_path, 
                                          base_name="poe")

Gathering negative samples for our classifier (i.e. photos not of Poe)

I'm going to use Bing's image search API here but I'm not including the code bits for it; the reason being is it's part of a prereleased library and it's against the terms of use, but you can easily query the API however you want, just know it returns some URL's of images that we have to download.

results = search_images_bing(BING_API, 'brown springer spaniel')
ims = results.attrgot('content_url')
print(f"Downloaded {len(ims)} brown spaniel images")
Downloaded 150 brown spaniel images
# Checking a download image
spaniel_test_img = Path("/home/bibsian/Desktop/spaniel.jpg")
if not spaniel_test_img.exists:
    download_url(ims[0], str(spaniel_test_img))

show_img(spaniel_test_img)
<matplotlib.axes._subplots.AxesSubplot at 0x7f9ee849ec50>

The downloaded image looks great; it's Poe like but definitely not my dog :-).

Now I'm going to finish out the downloads and throw them all into the same directory as our labeled Poe images (poe_classifier_path).

spaniels_path = Path("/home/bibsian/Desktop/spaniels")
if not spaniels_path.exists():
    spaniels_path.mkdir()
if not spaniels_path.ls():
    download_images(spaniels_path, urls=results.attrgot('content_url'))
spaniel_downloads = get_image_files(spaniels_path)

Let's check and see if any of the download images are corrupt:

failed = verify_images(spaniel_downloads)
# Looks good
len(failed) 
0

Okay now that we've downloaded the files and verified they aren't corrupted, let's move them to the data directory we'll be using for the classifier and create our DataLoaders:

move_files_and_rename_with_base_index(source_files=spaniel_downloads,
                                      dest_dir=poe_classifier_path, 
                                      base_name="spaniel")

Let's verify our poe_classifier_path has images of spaniels and Poe:

! ls $poe_classifier_path | tail -3
spaniel_98.jpg
spaniel_99.jpg
spaniel_9.jpg
! ls $poe_classifier_path | head -3
poe_0.jpg
poe_100.jpg
poe_101.jpg

Everything looks good from a directory structure standpoint now; onwards to the DataLoaders and model.

The DataLoaders object

Now that we organized our data let's instantiate the ImageDataLoaders object for our own use and define our labeling function:

def is_poe(x):
    """ x is the filename of a Path object"""
    return "poe" in x
poe_dls = ImageDataLoaders.from_name_func(
    path=poe_classifier_path, fnames=get_image_files(poe_classifier_path),
    label_func=is_poe, valid_pct=0.4, seed=10, 
    item_tfms=Resize(124,124)
)
poe_dls.show_batch()

Now wasn't that easy :-)

Fine tuning our model

As a final step, we can take our DataLoaders object and fine tune it and see how well the model can distinguish Poe from other spaniels.

learner = cnn_learner(poe_dls, resnet34, metrics=error_rate)
learner.fine_tune(3)
epoch train_loss valid_loss error_rate time
0 1.511148 1.483697 0.474576 00:15
epoch train_loss valid_loss error_rate time
0 0.752346 0.442981 0.220339 00:15
1 0.567987 0.156687 0.050847 00:14
2 0.427543 0.141456 0.042373 00:14

That's a pretty decent model... The training loss keeps dropping, the validation loss keeps dropping, the training loss is less than validation loss with no signs of overfitting; not bad for a few lines of code.

Let's do a sense check of the model now and plot what it got wrong:

# learner.save("poe_classifier_fine_tuned")
poe_interpret = ClassificationInterpretation.from_learner(learner)
poe_interpret.plot_confusion_matrix()
poe_interpret.plot_top_losses(8)

Now what?

Looking at our top losses you might be a bit surprised to see how wrong it got pictures that were clearly my dog... but let's delve a bit deeper into this and check out some more images of Poe below:

poe_dls.show_batch()

You might not be able to tell from this subset but a lot of Poe's images have more than just him. So when I think about how the model could get shots of him so wrong it makes me think our model is learning about features that aren't specific to Poe but probably the background that makes up more of his photos :-/...

For now we're going to leave it here but next time we'll look into the what features of the image our model is really focusing on in order to distinguish Poe from other spaniels.

Thanks for reading and feel free to leave any comments, questions, or corrections!