OpenCV Fundamentals

This is a straight-to-the-point tutorial. Guides users through the fundamentals of image processing in OpenCV.
Author

Mahmut Osmanovic

Published

February 11, 2025

What You’ll Learn: A Quick Overview

This tutorial is written with the intent of being easily digestable. It can serve as a great starting point for begineers or as reference material for others. It touches and elucidates upon I/O functionalities of OpenCV, size and color manipulation of images, blurring, image manipulation through static and adaptive thresholds, edge-detection, contours and lastly, parameteric drawing.

To download OpenCV, simply paste pip install opencv-python in the terminal of your choice.

All of the assiociated code and several examples are available at the following GitHub repository: https://github.com/mosmar99/OpenCV-Fundamentals

1.0 | Input/Output

There are three types of standard I/O in OpenCV. The first two being images and videos stored somewhere on your computer. The third being your webcam.

import cv2

# read image
image_path = "beagle.png"
image = cv2.imread(image_path)
print('(height, width, #channels) <=>', image.shape)

cv2.imwrite("image_out.png", image)

# visualize image
cv2.imshow('Beagle Image', image)
cv2.waitKey(0)

The code snippet above simply imports the OpenCV library (import cv2), specifies the path within the folder that the image is located in and reads the image using the imread function within the cv2 module. Note that all cv2 images internally are stored as numpy arrays. Each image has an associated height, width and channel count specified in precisely that order. The channel count often represents the basic units of color within the image, often three (Red-Green-Blue) for most images. Note that cv2 by default utilizes the BGR color scheme instead of the common RGB scheme. Writing an image simply saves it. The show command displays it, which often is useful to visually inspect applied transformations. The cv2.waitKey(0) functions enables the user to close down the shown image by either cliking the window exit button or, simply any other keyboard button.

import cv2

# read video
video_path = "beagle_vid.mp4"
video = cv2.VideoCapture(video_path)

# visualize video
ret = True
while ret:
    ret, frame = video.read()
    # video.read() returns boolean "ret=True" whilst there remains frame in my video
    if ret:
        cv2.imshow('Beagle Frame', frame)
        # my beagle video is 30 frames/second: (1/30)*1000 ms/frame
        if cv2.waitKey(33) & 0xFF == ord('q'):
            break       
video.release()
cv2.destroyAllWindows()

To read videos instead of images, simply use the VideoCapture function. The video.read() reads frames in the videos until they run out, which is specified in the boolean ret. Whilst there still are frames to displayed, they are, for the amount of milliseconds specified within waitKey, or until the user manually presses the key q on the keyboard.

import cv2

# read webcam
webcam = cv2.VideoCapture(0)

if not webcam.isOpened():
    print("Error opening video")

# Visualize webcam
while True:
    ret, frame = webcam.read()

    cv2.imshow('frame', frame)
    if cv2.waitKey(35) & 0xFF == ord('q'):
        break

webcam.release()
cv2.destroyAllWindows()

Instead of specifying a path to a video, the video is generated live through the incoming frames from the local webcam. I selected webcam 0, you may choose another if you have several cameras. Note that the loop is not conditioned on the boolean ret returning true, since the frames constantly are incoming from the webcam. The user can similarly quit by pressing q or simply exiting the interface.

2.0 | Size Manipulation

import cv2

img = cv2.imread("beagle.png")

resized_img = cv2.resize(img, (330, 180))

print(img.shape)
print(resized_img.shape)

cv2.imshow('img', img)
cv2.imshow('resized_img', resized_img)
cv2.waitKey(0)

The image is simply resized, looks similar but smaller. Its width and height have been adjusted.

import cv2

img = cv2.imread("beagle.png")

print(img.shape)

cropped_img = img[50:, 75:440]

cv2.imshow('img', img)
cv2.imshow('cropped_img', cropped_img)

cv2.waitKey(0)

The image has in this case actually been cropped, i.e., certain parts of the image have been pruned. The image is the same as the before, of the beagle, I simply cropped away parts of the background.

3.0 | Color Manipulation

import cv2

img = cv2.imread("beagle.png")

# standard cv2 color space: BGR (blue-green-red)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

print(img.shape)
print(img_gray.shape)
print(img_hsv.shape)

cv2.imshow('beagle', img)
cv2.imshow('beagle_gray', img_gray)
cv2.imshow('beagle_rgb', img_rgb)
cv2.imshow('beagle_hsv', img_hsv)

cv2.waitKey(0)

One can easily adjust the color scheme of the image from three channeled BGR scheme (the OpenCV default) to any other. Be it a single channel (GRAY scale) or other variations of three channels color schemes (RGB or HSV). This is accomplished through the use of cvtColor, i.e., convert color. The available colors to convert from/to can be found at https://docs.opencv.org/3.4/de/d25/imgproc_color_conversions.html.

Human Vision

Why are the colors of lossless images commonly encoded in three channels? It has to do with our biology. The human eye has 3 types of PHOTORECEPTOR cells for color (cones), RED, GREEN and BLUE. Trichromacy is not unique to humans; several animals can see colors that we cannot, and vice versa.

4.0 | Blurring an Image

import cv2 

img = cv2.imread("old_pic.jpg")

k_size = 3
img_blur = cv2.blur(img, (k_size, k_size))
img_gaussian_blur = cv2.GaussianBlur(img, (k_size, k_size), 1)
img_gaussian_blur = cv2.GaussianBlur(img_gaussian_blur, (k_size, k_size), 1)
img_median_blur = cv2.medianBlur(img, k_size)

cv2.imshow('img', img)
cv2.imshow('img_blur', img_blur)
cv2.imshow('img_gaussian_blur', img_gaussian_blur)
cv2.imshow('img_median_blur', img_median_blur)
cv2.waitKey(0)

One can directly blur an image through several different techniques. What all techniques have in common is their use of kernels, which can be thought of NxN size arrays, where N is a odd positive whole number. The kernel slides through the image and mathematically manipulates the old pixels within the kernel, generating new ones to replace the old. It manipulates the old pixel values through mathematical operations with the specified kernel. The kernel consists of weights. The weights can be distributed in various manners. The weight distribution directly impact the resulting blur. For example, a kernel with a gaussian weight distribution will prioritize the pixels close to the kernel center. A simple on the other hand blur prioritizes all pixels within the kernel equally. All these methods have their strengths and weaknesses. For a more through explanation, please visit https://docs.opencv.org/4.x/d4/d13/tutorial_py_filtering.html. Physical photographic images tend to deteriorate by time. I managed to remove some noise in an old photo, especially in the faces and the background, see the before (to the left) and after (to the right) below.

5.0 | Thresholds

import cv2

img = cv2.imread('beagle.png')

img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

ret, thresh = cv2.threshold(img_gray, 150, 255, cv2.THRESH_BINARY)

thresh = cv2.blur(thresh, (10, 10))
ret, thresh = cv2.threshold(thresh, 80, 255, cv2.THRESH_BINARY)

cv2.imshow('img', img)
cv2.imshow('thresh', thresh)
cv2.waitKey(0)

Upon setting a threshold, one specifies when a pixel should go completely dark or bright (I set it to go completely white, i.e., 255). One can ofcourse combined the techniques which have been utilized before, for example, the blurring. I applied a blurring effect in order to achieve a smoother color continuum between black and white.

import cv2

img = cv2.imread("beagle.png")

img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

thresh =  cv2.adaptiveThreshold(img_gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 7, 8)

cv2.imshow('img', img)
cv2.imshow('thresh', thresh)
cv2.waitKey(0)

The adaptive threshold on the other hand computes threshold by itself. Resulting in that every single section of the image will have its own threhsold.

6.0 | Edge Detection

import numpy as np
import cv2

im = cv2.imread("dzeko.png")

im_edge = cv2.Canny(im, 100, 200) # highlight edges
im_edge_d = cv2.dilate(im_edge, np.ones((2, 2), dtype=np.int8)) 
im_edge_e = cv2.erode(im_edge_d, np.ones((3, 3), dtype=np.int8)) 

cv2.imshow('dzeko', im)
cv2.imshow('dzeko_edge', im_edge)
cv2.imshow('dzeko_edge_d', im_edge_d)
cv2.imshow('dzeko_edge_e', im_edge_e)
cv2.waitKey(0)

There are functionalities within OpenCV that can detect edges. One can subsequently dilate (make edges thicker) or erode (make edges thinner) images as desired. Within Canny, one first enters the image in question, and subsequently the minimum and maximum for for the hythersis. The tighter this interval the less edges will be detected. What type of edges that are detected with Canny is also specified by the interval. For further implementation details, see https://docs.opencv.org/4.x/da/d22/tutorial_py_canny.html. The image you see above is of footballer Edin Dzeko.

7.0 | Parametric Drawing

import cv2

im = cv2.imread("whiteboard.png")

print(im.shape)

cv2.line(im, (100, 150), (200, 250), (0, 255, 0), 3)
cv2.rectangle(im, (50, 50), (100, 100), (0, 0, 255), -1)
cv2.circle(im, (250, 150), 80, (255, 0, 0), 3)
cv2.putText(im, 'Hello, Human?', (100, 300), cv2.FONT_HERSHEY_COMPLEX, 1, (255, 255, 0), 2)

cv2.imshow('img', im)
cv2.waitKey(0)

OpenCV also enables user to draw directly on top of images. There are functions denoting lines. One specifies start/ending coordinates, line color and thickness. Similarly, you can draw rectangles, circles or add some text.

8.0 | Contours

With contours, one can not only highlight all object contours, but also box in objects in virtue of those contours. It is useful to know that one may not necessarily always need to apply an advanced Computer Vision Algorithm such as YOLO to simply identify objects. Sometimes it suffices to apply traditional methods.

import cv2
print(cv2.__version__)

img = cv2.imread('birds.png')

img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

ret, thresh = cv2.threshold(img_gray, 100, 255, cv2.THRESH_BINARY_INV)

# findContours expects thresh to be an image of one channel -> convert img to img_gray
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

for contour in contours:
    if cv2.contourArea(contour) > 135:
        # cv2.drawContours(img, contour, -1, (0, 255, 0), 1)

        x1, y1, w, h = cv2.boundingRect(contour)
        cv2.rectangle(img, (x1, y1), (x1+h, y1+h), (0, 255, 0), 2)

cv2.imshow('img', img)
cv2.imshow('img_gray', img_gray)
cv2.imshow('thresh', thresh)
cv2.waitKey(0)

In this case I utilized the contours that were found of the objects within the inversed binary color scheme (birds became white and backgroun black). This is due to that the contour function expects a black background. Nonetheless, I check whether or not the contour area is large than a specified threshold value, as to not highlight small holes with larger objects. Subsequently I harness the box coordinates through boundingRect, which returns the bottom left coordinates of the rectangle, together with the width and height of the box. We can thereafter draw rectangles around the identified objects through the use of the rectangle function detailed in the previous section.