Convolution Unconvoluted

April 1, 2008

Convolution is a big topic in image processing. It’s a lot easier than people write it out to be. Let’s begin with a 5 x 5 pixel image; that is, we have a total of 25 pixels. You can think of this image as a matrix of 5 rows and 5 columns. Here’s a picture:

Convolution is a mathematical binary operation, which means you need two of something before you can perform a convolution. Just like addition, subtraction, division, and multiplication need two of something before you add, subtract, divide, or multiply respectively, convolution is no different.

The other piece of info we need before we carry out a convolution is what we call a Kernel. Don’t get thrown off by this, it’s just a name. Perhaps you can recall your early days in elementary school where you were taught similar names for the more familiar operations. Take division for example: you learned there are names such as the divisor, the dividend, and the quotient—all corresponding to the numbers carrying out a particular function within the operation of division. You can’t divide if you only have a dividend: you need a divisor as well. We can do this for subtraction also: you have the minuend, subtrahend, and difference. Complicated? Here’s a picture:

Just remember these are just naming conventions and are a shorthand notation of referring to “this” or “that” without any ambiguity.

Before we explain the intricacies of convolution, let’s begin with the elementary ideas. Imagine you have an ice cube holder. If you fill the ice cube holder to the brim and let the water freeze, you get ice cubes stuck together on a plane of water. Suppose you cut out a 2 x 3 plane of cubes, that is, you have a plane of ice filled with two rows of cubes and three columns of cubes. Suppose also that we have the kind of ice cube tray only Costco would sell: a tray that fits 25 ice cubes, 5 rows and 5 columns. Without rotating the 2 x 3 plane of ice cubes, find all possible ways of placing those cubes neatly inside the 5 x 5 ice cube tray. Here’s a picture:

Now let us choose the convention of placing a thumbtack on the top left corner of the 2 x 3 plane of ice cubes. Placing the 5 x 5 ice cube tray on a white sheet of paper, let’s also write numbers to denote the rows and columns of the tray. We begin with row 0 and finish off at row 4. Likewise, we begin with column 0 and finish off at column 4. Here’s a picture:

Every time we place our 2 x 3 plane of ice cubes inside the 5 x 5 tray, we take note of where the thumbtack is pointing to. If we place the 2 x 3 plane of ice cubes in the very top left-hand side of the 5 x 5 tray, the thumbtack points to the location row 0 column 0, which we will write as (0,0). Why do we have the thumbtack? I’ll get to that soon. At this point, I want you to make a leap from the current visual analogy and do a little math. Suppose there are numbers inside every hole in the ice cube tray. Suppose also there are numbers locked inside every cube of the 2 x 3 ice cube plane. Having placed the 2 x 3 ice cube plane on the location (0,0), multiply the numbers inside inside the ice cube tray with the numbers inside the 2 x 3 ice cube plane. Here’s the rule: you can’t multiply a number that lives in the hole in a tray with a number in an ice cube that does not lie inside that hole. Once you’ve carried out all those multiplications (which should be a total of 6 multiplications, since your 2 x 3 ice cube plane is made up of 6 ice cubes in total), add the results to obtain the final result.

Having found the result corresponding to the location (0,0), I’ll tell you now that this result is the number you store in a new ice cube tray at row 0 column 0, which is (0,0). You do this for every possible way you can place the 2 x 3 ice cube plane inside the original 5 x 5 ice cube tray, making sure that you keep track of where the thumbtack is. The reason the thumbtack is important is because it tells you exactly where you should store the result you obtain in your new ice cube tray.

In the end, you should end up with a new ice cube tray that has a 4 x 3 size, that is, four rows and 3 columns.

From here, it should be easy to make the shift to talking about images, pixels, mathematical convolution, kernel, input and output images, and dimensionality.

References:

“Convolution.” April 1, 2008. ©2003 R. Fisher, S. Perkins, A. Walker and E. Wolfart., http://homepages.inf.ed.ac.uk/rbf/HIPR2/convolve.htm