Lets install the packages and download the data using the commands as shown below.

# Install the packages
# !pip install
!pip install fastai==0.7.0
!pip install torchtext==0.2.3
!pip install opencv-python
!apt update && apt install -y libsm6 libxext6
!pip3 install
!pip3 install torchvision
# Download the Data to the required folder
!mkdir data
!wget -P data/
!wget -P data/
!tar -xf data/VOCtrainval_06-Nov-2007.tar -C data/
!unzip data/ -d data/
!rm -rf data/ data/VOCtrainval_06-Nov-2007.tar
%matplotlib inline
%reload_ext autoreload
%autoreload 2
!pip install Pillow
from fastai.conv_learner import *
from fastai.dataset import *
from pathlib import Path
import json
import PIL
from matplotlib import patches, patheffects

Lets check what’s present in our data. We will be using the python 3 standard library pathlib for our paths and file access .


The data folder contains different versions of Pascal VOC .

PATH = Path('data')
# iterdir() helps in iterating through the directory of PASCAL_VOC

  • The PATH is an object oriented access to directory or file. Its a part of python library pathlib. To know how to leverage the use of pathlib function do a PATH.TAB .
  • Since we will be working only with pascal_train2007.json , Let’s check out the content of this file.
training_json = json.load((PATH/'PASCAL_VOC''pascal_train2007.json').open())
# training_json is a dictionary variable.
# As we can see Pathlib object has an open method .
# json.load is a part of Json (Java Script Object Notation) library that # we have imported earlier.

This file contains the Images , Type , Annotations and Categories. For making use of Tab Completion , save it in appropriate variable name.

IMAGES,ANNOTATIONS,CATEGORIES = ['images', 'annotations', 'categories']

Lets see in detail what each of these has in detail:-

  • The IMAGES consist of image name , its height , width and image id.

  • The ANNOTATIONS consist of area, bbox(bounding box), category_id (Each category id has a class or a name associated with it ).
  • Some of the images has polygon segmentation i.e the Bounding box around the object in the image. Its not important to our discussion.
  • The ignore flag says to ignore the object in the image if the ignore flag=1 (True).

  • The CATEGORIES consists of class(name) and an ID associated with it.

For easy access to all of these , lets convert the important stuffs into dictionary comprehension and list comprehension.

FILE_NAME,ID,IMG_ID,CATEGORY_ID,BBOX = 'file_name','id','image_id','category_id','bbox'
categories = {o[ID]:o['name'] for o in training_json[CATEGORIES]}
# The categories is a dictionary having class and an ID associated with # it.
# Lets check out all of the 20 categories using the command below

training_filenames = {o[ID]:o[FILE_NAME] for o in training_json[IMAGES]}

# contains the id and the filename of the images.

training_ids = [o[ID] for o in training_json[IMAGES]]
# This is a list comprehension.

Now , lets check out the folder where we have all the images .

# The JPEGImages in red is the one with all the Images in it.

JPEGS = 'VOCdevkit/VOC2007/JPEGImages'
# Set the path of the Images as IMG_PATH
# Check out all the Images in the Path

Note:- Each image has an unique id associated with it as shown above.


The main objective here is to bring our bounding box to proper format such that which can be used for plotting purpose. The bounding box coordinates are present in the annotations.

A bounding box is a box around the objects in an Image.

Earlier the Bounding box coordinates represents (column, rows, height, width). Check out the image below.

  • After passing the coordinates via hw_bb() function which is used to convert height_width to bounding_box, we get the coordinates of the top left and bottom right corner and in the form of (rows and columns).
def hw_bb(bb): return np.array([bb[1], bb[0], bb[3]+bb[1]-1, bb[2]+bb[0]-1])
  • Now , we will create a dictionary which has the image id as the key and its bounding box coordinate and the category_id as the values.
# Python's defaultdict is useful any time you want to have a default     # dictionary entry for new keys. If you try and access a key that doesn’t # exist, it magically makes itself exist and 
# it sets itself equal to the return value of the function you specify # (in this case lambda:[]).

training_annotations = collections.defaultdict(lambda:[])
for o in training_json[ANNOTATIONS]:
if not o['ignore']:
bb = o[BBOX]
bb = hw_bb(bb)

  • In the above chunk of code, we are going through all the annotations , and considering those which doesn’t say ignore . After that we append it to a dictionary where the values are the Bounding box (bbox )and the category_id(class) to its corresponding image id which is the key.
  • One problem is that if there is no dictionary item that exist yet, then we can’t append any list of bbox and class to it . To resolve this issue we are making use of Python’s defaultdict using the below line of code.
training_annotations = collections.defaultdict(lambda:[])
  • Its a dictionary but if we are accessing a key that isn’t present , then defaultdict magically creates one and sets itself equals to the value that the function returns . In this case its an empty list. So every time we access the keys in the training annotations and if it doesn’t exist , defaultdict makes a new empty list and we can append to it.


Lets get into the details of the annotations of a particular image. As we can see in the snapshot below .

  • We take a particular image.
  • Get its annotation i.e the Bounding Box and the Class of the Object in the BBox. It means what are the objects present in the class along with the coordinates of the objects.
  • Check what does that class refers to in the below example. In this case the class or the category is a car.

Some libraries take VOC format bounding boxes, so the bb_hw() function helps in resetting the dimension into original format:

bb_voc = [155, 96, 196, 174]
bb_fastai = hw_bb(bb_voc)
# We won't be using the below function for now .
def bb_hw(a): return np.array([a[1],a[0],a[3]-a[1]+1,a[2]-a[0]+1])

read original article here