Computer Vision & Image Processing

September 15, 2018


The terms "computer vision" and "image processing" are used almost interchangeably in many contexts. They both involve doing some computations on images. But are they really the same thing? Let's talk about what they are, how they are different, and how they are linked to each other.

Image processing focuses on, well, processing images. What this means is that the input and the output are both images. An image processing algorithm can transform images in many ways: smoothing, sharpening, changing the brightness and contrast, highlighting the edges, and so on.

Computer vision, on the other hand, focuses on making sense of what a machine sees. A computer vision system inputs an image and outputs task-specific knowledge, such as object labels and coordinates.

Computer vision and image processing work together in many cases. Many computer vision systems rely on image processing algorithms. For example, computer vision systems rarely use raw imaging data that comes directly from a sensor. Instead, they use images that are processed by an image signal processor.

The opposite is also possible. It wasn't common for an image processing algorithm to rely on computer vision systems in the past, but more and more advanced image processing methods have started to use computer vision to enhance images. Face beautification filters, for example, use computer vision techniques to detect faces and apply smoothing filters such as a bilateral filter selectively. They can do more advanced stuff, such as enhancing eye clarity or emulating a spotlight by detecting facial landmarks and tuning the images locally. I know the result you see on the right doesn't really look good. It looks very artificial. I did it on purpose, though, to show the difference better.

Another key characteristic of computer vision is the use of machine learning. We have talked about machine learning in the earlier videos, but if you are not familiar with the concept, it's a field of study that focuses on teaching machines how to perform a certain task given a set of examples. For example, we can build a model that can tell the difference between a cat and a dog after being trained on pictures of cats and dogs.

It's true that computer vision heavily relies on machine learning, but that's no longer a differentiator. Many advanced image processing methods also use machine learning models to transform images to accomplish a variety of tasks, such as applying artistic filters to an image, tuning an image for optimal perceptual image quality, or enhancing details to maximize the performance for computer vision tasks.

It's worth mentioning that there isn't really a hard line between these two fields. The line between computer vision and image processing gets blurry when you do pixel to pixel transformations. Let's take semantic segmentation as an example. If a model produces per-pixel labels for an input image, then its output can be considered as an image. In that sense, the model would be doing some sort of image processing. On the other hand, since such a transformation involves image understanding, trying to understand what's in the input, it would also be considered computer vision. Overall, I think I would still consider semantic segmentation more of computer vision than image processing, but you get the idea.

Another example of the interplay between image processing and computer vision would be the use of Convolutional Neural Networks or CNNs for short. CNNs typically take pixel intensity values as inputs and learn to process them in a way that makes it possible to accomplish a certain computer vision task, such as image recognition. The output of such a model can, for example, be a label that describes what's in the input image. Internal layers of CNNs can be considered as image filters with tunable parameters. Therefore, what a CNN does can be considered as some sort of adaptive image processing. The use of CNNs is not limited to image processing though. They can be used to process and analyze other types of data as well.

CNNs do a great job at vision, audio, and even natural language processing applications. Researchers and engineers have built amazing applications using CNNs. If you want to learn more about how they work, check out my earlier videos in the Deep Learning Crash Course series.

Alright, that's all for today. I hope you liked it. Give a thumbs up if you liked this video. If you have any comments, questions, or recommendations for my next videos, let me know in the comments section below. Subscribe for more videos. As always, thanks for watching, stay tuned, and see you next time.