This article is part of a series introducing developers to Computer Vision. Check out other articles in this series.
Convolutions and Kernels
When we discussed Image Processing, we analyzed a single pixel at a time and manipulated (multiplied/subtracted) it against a constant value. We then attempted to find objects based on a template image — which is the process of comparing a group of pixels in a sliding window against a constant group of pixels in the template image. We are now going to explore the space in between. We will look at the relationships of adjacent pixels within an area of an image. We are going to analyze images based on a smaller group of constant pixels. The small group of pixels are a small 3×3 or a 5×5 image called a Kernel represented by a simple matrix. Think of this approach as using a very small image template that is 3×3 or 5×5 pixels in size as opposed to a larger image template. For instance, the Moon template we used earlier was 226×213 pixels large.
We are going to dig into a technique that is incredibly important in computer vision. This fundamental technique is known as a convolution. A convolution is done by multiplying a pixel and its neighboring pixel’s color values by a kernel using a sliding window. Convolutions and Kernels manipulates pixels not solely based on the value of the pixel itself, but on the pixels in the immediate vicinity of a particular pixel (sometimes referred to as Connected Pixels or Neighbors). A convolution is a just another type of Image Filter and when reading about these topics, these terms are often used interchangeably.
We will first look at a couple of popular kernel convolutions to blur images. The Box Blur (sometimes referred to as Mean or Average Blur), looks at each pixel of an image and replaces its value with the average value of all of its surrounding pixels. In our example, we will be using a simple 3×3 matrix kernel where all values are 1. The box blur kernel:
[1, 1, 1]
[1, 1, 1]
[1, 1, 1]
While the above kernel is incredibly simple — it’s going to help us understand how convolutions work. In this convolution, because the matrix values are all 1, the result of each window is the average value of the center pixel and its neighbors. Since we are using a sliding window, we cannot update the pixel value in-line (otherwise it would impact the next time the window shifts). We update a new image with the average value at the position corresponding to the center of the kernel.
Using a sliding window, the convolution will process all of the pixels in the image. Once the convolution is completed, our new image will appear blurred.
In the above animation, notice the pixels with the values of 8 and 9. These two pixels, in particular, are outliers compared to all the other pixels in the images. These pixels stand out from the rest of the pixels. After the convolution, these pixel values are replaced with 109 and 95 bringing those values more in alignment with the neighbors.
SIDE NOTE: Also visible in the animation is that the pixels closest to the edges are not calculated in the convolution. I’ve seen examples dealing with those border pixels in three different ways. The pixels can be discarded, they can be carried over from the original, or an average of the available pixels can be calculated.
As we can see above, the result of the box blur is to “smooth” out the image by making pixels more like their neighbors — resulting in a blurred image.
Another blur kernel worth discussing is the Gaussian Blur kernel. Instead of a direct average of all of the values surrounding the pixel, we give each of the pixels in the kernel a different weight. We give the center pixel the most priority and pixels furthest away from the center less priority. As a result, blurs using a Gaussian Kernel have a tendency to appear less “Boxy”. Here is one example of a simple and easy-to-read 3×3 Gaussian Kernel:
[1, 2, 1]
[2, 4, 2]
[1, 2, 1]
The convolution of a Gaussian kernel is identical to the Box Blur kernel. Think of the box blur as dividing the sum of the kernel values, which totals 9. In the Gaussian kernel illustrated above, the sum of the kernel values is 16.
As a quick visualization exercise, let’s also compare what the Box Blur and Gaussian Blur kernels would look like if we were to plot them in a 2D graph.
The Gaussian kernel illustrated above is a very simple kernel with simple whole numbers. The specific weights of a Gaussian Kernel can be calculated based on the size (the width of the curve) and the strength (the height of the curve) of the Kernel. While the specific math of how we get to the size and values of a Gaussian kernel is a bit beyond the intention of these articles, it’s important to point out that the calculations of the Gaussian kernel are dependent on the sigma value — a number that describes the smoothness of the Gaussian curve (Standard Deviation, for the Math Nerds). That is, the sigma represents the relationship between the numbers in the center of the kernel to the edges of the kernel. We will revisit the importance of sigma when we discuss the Differences of Gaussian feature detector later. For now, consider that the sigma will represent how much the Gaussian kernel smooths out an image. The larger the sigma, the stronger the blur.
There are lots and lots of different kernels which are used in convolutions. Another popular kernel is a sharpening kernel which increases the contrast between adjacent pixels — the exact opposite of the blurs we looked at in this article.
Convolutions and Kernels come up a lot when it comes to computer vision. Later in the series, we will revisit Convolutions when discussing Machine Learning techniques such as Convolutional Neural Networks. In the next article, we will be using convolutions to find edges in images.
SIDE NOTE: Readers that are familiar with Convolutions will notice that we dove right into 2D convolutions. This decision was made to guide readers into seeing Kernels as tiny images and to draw parallels to other topics we have already covered — such as templating. Following this path, developers looking to implement convolutions should absolutely look into 1D convolutions. There are cases where 2D convolutions can be broken up into two separate 1D convolutions optimizing algorithms for speed.
Convolutions are a process where we operate on pixels based on the pixel itself and its neighbors using the sliding window technique. We use kernels to define the values which are used in convolutions. The Box and Gaussian blurs are examples of convolutions we explored. The Box and Gaussian blurs makes each pixel more like its neighbors resulting in pixel “smoothing” and blurring the appearance of the image.
Sources and More Info
- Image Kernels Explained Visually — Article by Victor Powerll
- Image Convolution — Lecture by Jamie Ludwing
- Smoothing — Lecture by Robert Collins
- How Blurs & Filters Work — Video by Computerphile
- Box Blur — Wikipedia
- Gaussian Blur — Wikipedia