UoL CS Notes

ELEC319 (Image Processing)

Image Processing

ELEC319 Lectures 2022-09-28

Digital Images

Discrete representation of spatial and intensity values in a 2D matrix.

Raw images are often:

Extremely noisy
Contain invisible bands of the EM spectrum.
Contain too much information (that needs to be cropped out)

Image Sensing & Optics

Images are a signal so it is collected using a sensor.

Light is reflected off, or emitted by, objects.
Photos pass through the optics.
Photos are focussed onto the focal plane.
- Electronically or chemically.
The energy of photos is importance when forming images.

Not all images are in the visible spectrum so we can use other wavelengths, ultrasonic and so on to create images.

Attenuation

graph of electromagnetic frequency against atmospheric attenuation ¹

IR Cameras

They are good when there isn’t a large amount of ambient light.

There are three different types of IR:

Near IR 0.7um - 2um: Little thermal emission, but can use laser illumination.
Mid-wave Infrared 3um - 5um: Good for hot objects as there is less background thermal emission.
Long-Wave Infrared 8um - 14um: Good for room temperature (thermal LWIR emission peaks at 300K)

Low Light

Visible band cameras only work well when there is ambient light.

This means they are not good at night.

To correct for this we can use an amplifier:

This introduces a lot of noise.
Tends to make the camera sensitive to glare.

Tools for Digital Image Processing

MATLAB (licensed)
- Through image processing toolbox.
ImageJ (free)
OpenCV (free)

Typical Image Processing System

graph LR
pp[Pre-processing] --> mp[Main Processing]
mp --> pp2[Post-Processing]

Pre-Processing:

Basic Correction
Basic Noise Removal
Background Removal
Alignment
Format Changing
Transformation

Main Processing:

Advanced Correction
Advanced Noise removal
Segmentation
Detection
Searching
Feature Extraction
Recognition

Post Processing:

Image Formation
Format Changing
Result Representation
Report Generation

You will gain significantly better results if you pre-process.

https://doi.org/10.1016/B978-0-12-809413-6.00005-9 ↩

Human Vision

ELEC319 Lectures 2022-09-29

Terminology

Field of View: Total angle covered by the camera in a single image.
Field of Regard: Total angle that can be covered by the camera in a series of images.
Focal Length (fl): Effective distance between lens and focal plane.
Aperture (d): Effective size of the lens.
Pixel: Individual element of the image.
F-number (fl/d): Measure of the light allowed through the lens.
Gain-offset Correction: Control of image exposure (gain) and removal of dark current (offset).
Frame Rate: Number of frames per second.
Dioptre: Optical power of a lens (same as $\frac 1 {\text{fl}}$)

Human Eye

The human eye has two optical components:

The cornea has an optical strength of 31 dioptres.
The lens is only 19 dioptres.

Rods & Cones

Rods:

Sensitive to all types of light.
About 100 times more sensitive to light than the cones.

Cones:

Concentrated around the centre of the field of view.
Come in three types:
- L (Long)
- M (Medium)
- S (Short)

Other animal often have limited colour vision due to a differing number of rods and cones.

Fovea & Saccades

The fovea is responsible for what we normally think of a human vision.

The high resolution region of human colour vision is only 2 degrees.

Stereo Vision

Binocular vision relies on the two eyes having different views of the same scene.

The distance between the two cameras is called the baseline.

The fact that the two cameras can see different views is called parallax.

Empirical Condition:

\[\frac{\theta_A-\theta_B}{\theta_\text{FOV}}\times N_\text{Pixels}>1\]

Colour

Colour images are composed of three bands: R,G,B

Often it is better to use HSI (Hue, Saturation, Intensity) instead of RGB, as the three colour bands overlap. This causes a change in one band to effect the others.

It is possible to convert from one colour space to another (CMYK to RGB conversion). A formula should be provided if this is required.

Camera Resolution

Resolution is limited by pixels and the field of view.

A field of view of $\theta$ rads with $N$ pixels gives an angular resolution of:

\[\Delta\theta = \frac\theta N\]

Johnson Criteria

This is an empirical experiment that determines the ability to correctly detect and recognise objects.

Detection (with 50% probability) = resolve 1 bar.
Recognition (with 50% probability) = resolve 3 bar.
Identification (with 50% probability) = resolve 5 bar.

1 bar is a minimum of 3 pixels.

Optical Effects

Vignetting: The shadowing of the corners of an image from the optical aperture (the effect of the finite focal length).
Optical Distortion: This is the effect of imperfect lenses. Barrel distortion expands towards the centre. Chromatic distortion presents itself as a prism effect.
Image Noise: Non-uniform response to stimulus across the focal plane.
Blooming: Where high intensity pixels leak into neighbours.

Image Histograms

ELEC319 Lectures 2022-10-06

Binary Images

Simplest type of images
Each pixel can either be black or white

The size of the image is:

\[M\times N\text{ bits}\]

where $M$ and $N$ are the rows and columns.

A histogram for a 1 bit image only has two peaks for dark and light.

Grayscale Images

Generally 8 bits per pixel
- This gives 256 levels of brightness.

Colour Images

Each channel RGB has 8 bits.
- This gives 24 bits per pixel.

You can generate histograms for each channel or sum the channels to account for all of them.

Colour Quantisation

You can’t represent an infinite number of colours using 24 bits. As a result the raw colour is quantised to fit in the colour space.

We can use the following algorithms to add dithering to account for quantisation:

Uniform
Median-cut
Octree
Popularity
Generalised Lloyd

Conversion

Colour Image to Grayscale Conversion

This can be achieved by converting each RGB pixel to greyscale by forming a weighted sum:

\[a_1R+a_2G+a_3B\]

MatLab’s rgb2gray uses $a_1 = 0.2989$, $a_2=0.5870$ and $a_3=0.1140$.

Grayscale to Binary Image Conversion

To convert from grayscale to binary we need a threshold value.

Thresholding is an easy way to:

Convert grayscale to binary.
Separate foreground from background.

There are two levels of thresholding:

Histogram
Multi-level Thresholding

Image Transformation

ELEC319 Lectures 2022-10-06

We can complete image transformations using matrix operations.

Arithmetic Operations

These are pixel by pixel operations:

Addition/Subtraction
Multiplication/Division
Logical Operations

We should scale the final output so that we don’t overflow/underflow. In MatLab we can use imadd, imsubtract etc.

Affine Transformations

These are combinations of:

Translation
\[\begin{bmatrix}x'\\y'\end{bmatrix}=\begin{bmatrix}x\\y\end{bmatrix}+\begin{bmatrix}x_i\\y_i\end{bmatrix}\]
Scaling
\[\begin{bmatrix}x'\\y'\end{bmatrix} = \begin{bmatrix}S_x& 0\\0 &S_y\end{bmatrix}\cdot\begin{bmatrix}x\\y\end{bmatrix}\]
Roation
\[\begin{bmatrix}x'\\y'\end{bmatrix} = \begin{bmatrix}\cos\theta&-\sin\theta\\\sin\theta&\cos\theta\end{bmatrix}\cdot\begin{bmatrix}x\\y\end{bmatrix}\]

Combined Transforms

We can put all transforms in the same format so we can combine them, to be used in one operation:

Translation
\[\begin{bmatrix}x'\\y'\\1\end{bmatrix}=\begin{bmatrix}1&0&x_i\\0&1&y_i\\0&0&1\end{bmatrix}+\begin{bmatrix}x_i\\y_i\\1\end{bmatrix}\]
Scaling
\[\begin{bmatrix}x'\\y'\\1\end{bmatrix} =\begin{bmatrix}S_x& 0&0 \\0&S_y&0\\0&0&1\end{bmatrix}\cdot\begin{bmatrix}x\\y\\1\end{bmatrix}\]
Roation
\[\begin{bmatrix}x'\\y'\\1\end{bmatrix} = \begin{bmatrix}\cos\theta&-\sin\theta&0\\\sin\theta&\cos\theta&0\\0&0&1\end{bmatrix}\cdot\begin{bmatrix}x\\y\\1\end{bmatrix}\]

3D Transformations

Refer to slide 20 for 3D transformation matrices for:

Translation
Scaling
Roll
Pitch
Yaw

3D transformations use 4 by 4 matrices.

Shapes & Connectivity

ELEC319 Lectures 2022-10-13

Pixels, Neighbours & Connectivity

$N_4$ Connectivity: Each pixel has 4 pixels directly around it.
$N_8$ Connectivity: This includes diagonally adjacent pixels, so all pixels surrounding a pixel.

Structuring Element

This is a mask that can be used to define pixel neighbours:

It is a small matrix of odd index.
1s in the matrix define pixels that are neighbours.

This type of structuring element is called flat.

SEs don’t have to be square.
SEs don’t have to be symmetrical.
Non-flat SEs contain values that are non-binary.

Structuring Element Uses

They are used to probe large images for specific shapes:

Therefore you should choose a SEs template to match the shape you want to find in the main image.

The size of the SE determines the detection scale:

Smaller SE size, will discover more small features.

Distance Between Two Pixels

Given two pixels $p=(x_1,y_1)$ and $p_2(x_2,y_2)$:

Chessboard distance:
\[\max(\lvert x_1-x_2\rvert, \lvert y_1-y_2\rvert)\]
Cityblock Distance (Manhattan Distance):
\[\lvert x_1-x_2\rvert+\lvert y_1-y_2\rvert\]
This assumes 8-connectivity.
Euclidean Distance
\[\sqrt{(x_1-x_2)^2(y_1-y_2)^2}\]

Distance Transform

For this transform we calculate:

The distance between each pixel not in the SE mask and the nearest pixel in the mask.

Connected Component Labelling

This is also known as region extraction. The idea is to:

Group pixels into components based on pixel connectivity.
- All pixels in a connected component share the same pixel intensity.
- This could be $N_4$ or $N_8$.
Label each pixel with a colour or greyscale.

Two-Pass Equivalence Class Resolution

This is a two-pass algorithm:

Scan the image (row-wise or column-wise), one pixel at a time:
1. When encountering a foreground pixel, look at the previously scanned, neighbour pixels (with 8-connectivity):
  - If any of the neighbours have a label assigned, assign the same label to the pixel.
  - If neighbouring pixels have more than one labels assigned, pick one arbitrarily.
  - Otherwise, assign a new label.
2. Every time a label is added, add that label to the equivalence table as a new row.
3. If two neighbouring pixels represent the same region, add them to the same row.
4. Continue until all pixels are covered.
Assign new labels for the rows of the equivalence table:
1. Scan the image pixel-by-pixel again, new labels to consolidate existing labels.

Shapes

A shape is an image region that can be defined using a set of finite boundary points.

Geometrical Features

The geometry of a region can be characterised in many ways:

Area: The number of pixels covered by the region.
Centroid: The position of the geometrical centre of the region
Major/Minor Axis: The length of the major/minor axis of an ellipse that has the same second moments as the region.
Eccentricity: The eccentricity of the ellipse is the ratio of the distance between the foci of the ellipse and its major axis.
Orientation: The orientation of the ellipse.
Euler Number: The number of holes within the region
Inscribed Radius: The radius of the largest cirlce that is contained by the region
Circumscribed Radius: The radius of the smallest circle that contains the region.

Some features that are dependent of a regions distribution of intensity are:

Centre of Mass

The position of the centre of the region.

Weighted by the intensities of the pixels.
Often the centre of mass’s displacement from the centroid is of interest.

Image Moments (Statistical)

Mean, variance, skewness.

We can define types of shapes by using correlations between these features.

Morphological Operations

ELEC319 Lectures 2022-10-19

Morphology is concerned with the shape of structures.

Morphological Erosion

morphological erosion ¹

The final image is a subset ($\subseteq$) of the original image.

Allows objects to shrink and separates objects & regions.

It is a pixel-by-pixel operation, sliding the structuring element (SE) at every pixel:

If the origin of the SE coincides with a 0 (background pixel) of the image:
- There is no change, move the the next pixel.
If th origin of the SE coincides with 1 (foreground pixel):
- Perform an AND operation on all the pixels within the SE.

Pixel-by-pixel operations are very slow.

Erosion is expressed by the set operation:

\[A\ominus B\]

where:

$A$ - The source image.
$B$ - The structuring element.

This is good for getting rid of salt noise (white spots) on images.

Morphological Dilation

morphological dilation ²

This is the opposite of erosion, allowing objects to expand.

It is a pixel-by-pixel operation, sliding the structuring element (SE) at every pixel:

If the origin of the SE coincides with a 0 (background pixel) of the image:
- There is no change, move the the next pixel.
If th origin of the SE coincides with 1 (foreground pixel):
- Perform an OR operation on all the pixels within the SE.

This results in the maximum output of the local region of the structuring element.

Dilation is expressed by the set operation:

\[A\oplus B\]

This is good for getting rid of pepper noise.

Other Morphological Operations

Morphological Opening

This is erosion followed by dilation with the same structuring element:

\[A\circ B = (A\ominus B)\oplus B\]

This smooths contours and removes noise.

Morphological Closing

Dilation followed by erosion with the smae structuring element:

\[A\bullet B = (A\oplus B)\ominus B\]

Joins nearby connected components together.

Morphological Boundary Extraction

We can get the boundary of an object by:

Eroding with a suitable SE.
Subtract this form the original image.

\[\beta(A) = A-A\ominus B=A-(A\ominus B)\]

Extracting Connected Components

Extraction of connected components can be performed using morphological operation as an iterative procedure.

Let $x_k$ denote the set of connected foreground pixels at the $k^{\text{th}}$ iteration.
Start the procedure at one of the connected foreground pixels $x_0$.
- $x_0$ is known as the seed.
Dilate this pixel: $x_0\oplus B$
To stop this dilation becoming connected to pixels which are not part of this component, perform a logical AND with the image:
\[(x_0\oplus B)\cap A\]
Use the resulting output as the seed for the next iteration.
Repeat this until no more extractions can be made.

This can be expressed in the following steps:

Do for $k={1,\ldots}$:
\[x_k(x_{k-1}\oplus B)\cap A\]
Until:
\[x_k=x_{k-1}\]

Region Filling

This closes holes in objects. It consists of the following steps:

Set:
\[x_0=1, x_x\oplus B\]
Do for $k={1,\ldots}$:
\[x_k=(x_{k-1})\cap \bar A\]
Until:
\[x_k = x_{k-1}\]

Hit-and-Miss Transform

This searches for images the structuring component in an image. It is given by:

\[A\otimes B=(A\ominus B_1)\cap(\bar A\ominus B_2)\]

Where we have two structuring elements:

\[B_1\cap B_2=\emptyset\]

Skeletonisation

This is the process for reducing foreground regions in a binary image to a skeletal remnant.

Let $A$ be an image and $B$ be a structuring element.

Let $(A\ominus kB)$ denote $k$ successive erosions of $A$ using $B$.

The the skeleton of an image $A$ is:

\[S(A)=\bigcup^K_{k=0}\left\{(A\ominus kB)-(A\ominus kB)\circ B\right\}\]

$K$ is the maximum number of iterations before the image is eroded to nothing.

Top-Hat

The difference between the image $I$ and the opened version of the image $I$:

\[T(I)=I\hat\circ s=I-I\circ s\]

This returns object of an image that are:

Smaller than the SE
Brighter than their surroundings.

This highlights peaks.

Bottom-Hat

The difference between the closed version of the image and the image $I$:

\[B(I) = I\hat\bullet s=I\bullet s-I\]

Returns objects of an image that are:

Smaller than the SE
Darker than their surrounding.

This highlights valleys.

https://www.cs.auckland.ac.nz/courses/compsci773s1c/lectures/ImageProcessing-html/mor-pri-erosion.gif ↩
https://homepages.inf.ed.ac.uk/rbf/HIPR2/figs/diltbin.gif ↩

Noise Filtering

ELEC319 Lectures 2022-10-27

Filtering

Filtering is a neighbourhood operation:

The output pixel is a function of neighbourhood pixels.

Linear Filtering

The output pixel is a linear combination of input pixels.

We use a mask that contains weights to be assigned to each pixel:

The operation is applies to every pixel by sliding the mask.
The output pixel is the sum of the input pixels multiplied by the weights in the mask.

Non-Linear Filtering

Anything that is not linear:

Square mean of neighbourhood pixels.
Median of neighbourhood pixels.

Mean Filtering

The idea is to take an average of all $N8$ pixels:

The filter has equal weights across all pixels in the neighbourhood.

Mean filtering is good to remove Gaussian noise.

The original data of the image is modified.

When averaging pixels on the edges of the image, non-existent pixels are treated as 0.

Linear Filtering, Cross-Correlation & Convolution

Linear filtering is the same as cross-correlation of two signals.

Convolution is the same as cross-correlation by the kernel is rotated by 180 degrees.

Median Filtering

This is the same as mean but the median of neighbourhood pixels instead of average.

Requires sorting of $N8$ weighted pixel value.

This is good for salt-and-pepper noise and preserves edges.

Gaussian Filtering

The mask is obtained using the following equation:

\[G_\sigma=\frac1{2\pi\sigma^2}e^{-\frac{(x^2+y2)}{2\sigma^2}}\]

we substitute in $(x,y)$ of the mask pixel to find it’s value.

The degree of smoothing is controlled by both the mask size and $\sigma$.

This filter produces a good output for use with edge detection algorithms.

Image Enhancement

ELEC319 Lectures 2022-11-03

Gain-Offset Correction

This determines the contrast in an image. We may have to re-scale images in a video to ensure all images in a sequence have similar brightness and contrast.

This correction fixes two issues of image sensors:

Gain - A multiplicative error in the output.
Offset - An additive error in the output.

Scaling & Equalisation

Scaling and equalisation clips and distorts the raw image data to produce a more appealing image. This can reduce the information in the final image. The following methods can be used:

Histogram Equalisation - Aims to improve apparent contrast.
Rescales the intensities so that the intensity distribution is near uniform.

Intensity Transforms

Given an image histogram:

$r$ is the original grey level (between 0 and $L-1$).
$s$ is the transformed output grey level (also between 0 and $L-1$).

Contrast Stretching

This method is for enhancing images with little contrast.

Given that the detail is given between $r_a$ and $r_b$:

Reduce the dynamic range before and after the range $r_a$ to $r_b$.
Increase the contrast in the range $r_a$ to $r_b$ to fill the histogram.
Intensity Level Slicing

This method can be used to enhance features within an image.

Given that the detail is given between $r_a$ and $r_b$:

Set all pixels in the range to a fixed (brighter) value.

Bit-Plane Slicing

Bit plane slicing is used to separate out useful information within an image:

In an $n$-bit image, each pixel has $n$ bits and there are $n$ planes.
The MSB contains the majority of the visually significant data.
The LSB is essentially noise.

Histogram Stretching

For an ideal image, the image histogram is flat. We can perform histogram stretching to spread a peak to fill the histogram:

Assume that a histogram of a low contrast image ranges from grey value $\check g$ to $\hat g$.
Assume that we wish to spread this to $[0,G-1]$.
- where $G-1>\hat g-\check g$.

We can do this transform with the following function:

\[g'=\left\lfloor\frac{g-\check g}{\hat g-\check g}G+\frac12\right\rfloor\]

This simple method only stretches the whole histogram. The process of flattening an image histogram is called histogram equalisation.

Edge Detection

ELEC319 Lectures 2022-11-17

Always filter noise before using edge detection to improve the results.

Derivative Filters

Derivative (gradient) filters are used to enhance edges as they can highlight sudden changes in an image.

The gradient operator is a vector function of a scalar field:

\[\nabla f=\begin{pmatrix}\frac{\partial f}{\partial x}\\\frac{\partial f}{\partial y}\end{pmatrix}\]

As we don’t have a continuous domain we can use a finite approach:

\[\begin{aligned} \frac{\partial f}{\partial x}[x,y]&\approx f(x+1,y)-f(x,y)\\ \frac{\partial f}{\partial y}[x,y]&\approx f(x,y+1)-f(x,y) \end{aligned}\]

gradient of an image $f$ is:

\[\nabla f=\left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}\right)\]

Derivative of Images

Since pixel values represent intensities, gradient points are in the direction of the maximum change in intensity:

Change in $x$ direction:
\[\nabla f=\left(\frac{\partial f}{\partial x},0\right)\]
Change in $y$ direction:
\[\nabla f=\left(0,\frac{\partial f}{\partial y}\right)\]
Change diagonally:
\[\nabla f=\left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}\right)\]

The edge strength is given by the magnitude of the gradient:

\[\left\vert\nabla f\right\vert=\sqrt{\left(\frac{\partial f}{\partial x}\right)^2+\left(\frac{\partial f}{\partial y}\right)^2}\]

The direction of the gradient is given by:

\[\theta=\tan^{-1}\left(\frac{\frac{\partial f}{\partial x}}{\frac{\partial f}{\partial y}}\right)\]

Gradient Operators

Edge detection can be completed using small convolution filters, usually two or three pixels square.

Given an image $I$ and a derivate kernel $d$, a pixel value $p$ is computed as a convolution:

\[p=d*h_p(I)\]

Three common gradient operators are:

Roberts:
\[\begin{pmatrix}1&0\\0&-1\end{pmatrix}\begin{pmatrix}0&1\\-1&0\end{pmatrix}\]
Prewitt:
\[\begin{pmatrix}-1&-1&-1\\0&0&0\\1&1&1\end{pmatrix}\begin{pmatrix}-1&0&1\\-1&0&1\\-1&0&1\end{pmatrix}\]
Prewitt:
\[\begin{pmatrix}-1&-2&-1\\0&0&0\\1&2&1\end{pmatrix}\begin{pmatrix}-1&0&1\\-2&0&2\\-1&0&1\end{pmatrix}\]

These operators sum to zero, so a region with no edges gives a zero response.

To use the operators:

Sum the mask multiplies by the image (element by element) for each operator.
Take the modulus of both components and add them both together.

Canny Edge Detection

Canny edge detection is good for detecting weak edges. It consists of:

Gaussian Blur

We can modify $\sigma$ to change the performance of the edge detection.
Sobel Operator
Calculate Edge Strength (magnitude of Sobel operator)

This is checked to ensure that is a a true edge (local maxima).
Calculate the edge angle (arc-tan of the ratio of Sobel components)

This is rounded to 0, 45, 90 and 135 degrees to find the edge direction.

A large $\sigma$ for the Gaussian blur suits wide edges and vice-versa.

Hough Transform

Pronounced “Huff”, it is used to detect lines and other geometric shapes.

Any line can be described by an equation in the form:

\[x\cos\theta+y\sin\theta=\rho\]

where $\rho$ and $\theta$ are parameters that specify the line. Any point on the same line has the same values for $\rho$ and $\theta$.

Algorithm for Detecting Straight Lines

Detect Edges (Canny)
Map the edges points to Hough Space
Accumulate the number of points.
Interpret the accumulator values to interpret lines of infinite length.

This is a computationally intensive task.

Image Segmentation

ELEC319 Lectures 2022-11-23

Segmentation is the separating of objects from the background. It divides images into regions.

Edges & Regions

Edge enhancement can divide images into regions:

Each region is bounded by an edge.

A threshold can detect edges in an edge enhanced image.

Edge Linking

Due to noise, edges may be unclear:

Edges may not be continuous.
Regions may not be self-contained.

We use edge linking to solve these issues. To link up “loose ends” in an image we can use:

Linking nearest neighbours.
Linking all within a given distance.
Probabilistic linking.

“Snakes”

Snakes is an advanced edge-linking or region extraction algorithm:

Uses defined edges and a cost-minimisation technique to find regions.
The cost function is minimised when the edges match the underlying features.
The optimisation can be done using any standard algorithm.

Thresholding

Some objects are not large enough to generate detectable edges (such as a small house in an image of a large valley).

A simple method for segmentation is to use multiple thresholds:

To select a certain band of grey levels.
The pixels in a region will often have a similar intensity.

Variance Growth

A problem with thresholding is with variance in light within an image. An automated method to threshold an image is by using variance growth:

Images often contain large regions with similar intensities.
Objects of interest are typically have a different brightness to their background.

variance growth graph

We want to choose a threshold on the flat part of the histogram.

Statistical Texture

We can use a kernel to define the neighbourhood of a pixel. These pixels are connected to the central pixel:

If you take the mean of the kernel:

Pixels in the same region will tend towards the central pixel.

Watershed Algorithm

Represent the image as a surface (with height defined by pixel intensity).
Applying a thresholding “flood-fill”:
- Link connected pixels to form the background.
Repeat this to form several regions as layers.

Colour Segmentation

We can start by representing the image in HSL:

We can then threshold on the hue or saturation channels to find the object.

Motion Segmentation

We can separate moving objects from stationary backgrounds:

Take the difference between the two images to clearly identify the object.

We can remove small camera motions and noise by using noise filtering on the output.

Region Labelling

After segmentation, region labelling is required. The result of image segmentation is a binary image with the foreground represented by ones.

Typically there will be a large number of foreground regions, this contains:

Objects of Interest
Background Regions
Localised Noise

This can be removed by using noise filtering.

To distinguish the regions we label connected foreground pixels. We use:

Connected Component Labelling
Extracting Connected Components

Noise in Segmentation

By knowing the type of noise (white, Gaussian, photon) we can use an appropriate model to remove it effectively.

Blur is also a form of noise. To deal with this we change change our blur model to account for blur caused by the sensor.

Image Compression

ELEC319 Lectures 2022-11-30

Lossless Compression

Relies on probabilistic theory:

Events that are less likely will contain more information.

A measure of average information is entropy also known as Shannon’s Information Measure:

\[H(x)=\sum_{i\in x}p(i)\log_2\left(\frac1{p(i)}\right)\]

where:

$x$ is a probabilistic experiment.
$x_i$ are all the possible outcomes

The base of the logarithm is the unit of information (base 2 means that it is measured in bits).

Entropy of an Image

The entropy $E$ of an $N\times N$ image is:

\[E=-\sum^{L-1}_{i=0}p_i\log_2(p_i)\]

where:

$p_i$ is the probability of the $i^\text{th}$ greylevel.
$L$ is the total number of greylevels.

additionally:

\[0\leq E\leq \log_2L\]

The average number of bits per pixel in a coder is:

\[\hat L=\sum^{L-1}_{i=0}l_ip_i\]

where:

$p_i$ the probability of the $i^\text{th}$ greylevel.
$l_i$ length in bits of the code for the $i^\text{th}$ greylevel.

Image Entropy Example

Consider an image uses 3 bpp and the histogram is flat across the domain:

As there are 3bbp there are 8 greylevels ($L=8$).
Flat histogram means that the number of pixels at each greylevel are the same ($p_i\frac18$).

\[\begin{aligned} E&=-\sum^{L-1}_{i=0}p_i\log_2(p_i)\\ E&=-\sum^{7}_{i=0}\frac18\log_2(\frac18)\\ E&=3 \end{aligned}\]

This means that there is no code that will provide better coding that what is used by default. An image with a flat histogram has the highest entropy.

Huffman Coding

This is a type of coding that encodes frequent symbols as short strings of bits and infrequent symbols as longer bit strings.

This is a minimum length code (it will generate close to the theoretical minimum).
It generates variable length code.

Method:

Sort the symbols in decreasing order of probability.
Combine the smallest two by addition.
Replace the two elements with a single combined element.
Goto step 2 until only two probabilities are left.
Work backwards along the tree and generate the code by alternating 0 and 1.

Huffman Coding Example

Consider a 10x10 image that uses 2bpp with the following histogram:

Greylevel	Count
0	20
1	30
2	10
3	40

Find the probability of each level:

Greylevel	Probability
0	.2
1	.3
2	.1
3	.4

Sort in descending order:

Greylevel	Probability
3	.4
1	.3
0	.2
2	.1

Add the bottom two rows:

Original Greylevel	Probability	Reduction 1	Reduction 2	Sort 2
3	.4	.4	.4	.6
1	.3	.3	.6	.4
0	.2	.3
2	.1

Sort between each iteration.

This gives the following Huffman tree:

graph TD
a(( )) -->|0| b(("g3 (00)"))
a -->|1| c(("g1 (1)"))
b -->|0| d(("g0 (010)"))
b -->|1| e(("g2 (011)"))

Coding Efficiency

Coding efficiency is defined as the ratio of entropy to the average number of bits per symbol.

For images we write this as:

\[\eta =\frac E{\hat L}=\frac{-\sum p_i\log_2 pi}{\sum l_ip_i}\]

Bit Plane Decomposition

We can make use of bit plane decomposition to compress an image. We can then use compression techniques on each plane, to take advantage of contiguous blocks in the most significant bits.

Constant Area Coding

Break the bit planes up into smaller sections of $n_1\times n_2$ pixels.

Each subsection can be encoded:

0 if it is white
10 if it is black
11 if it mixed black and white (followed by the bits used in the block).

Run Length Encoding

In a binary image we can encode a line with runs of zeros first.

A line of:

\[11111000011\]

would be encoded as:

0, 5, 4, 2

There are zero 0s to start so we start with a zero.

Lossy Compression

JPEG

JPEG uses one of the following coding schemes:

Lossy baseline coding:
- Based on discrete coding transform (DCT).
Extended coding for greater compression.
Lossless independent coding.

Lossy Baseline Coding (DCT)

Pre-processing
1. Convert the image to YCbCr.
2. Partition the image into 8x8 tiles.
3. Centre the intensity values about 0 (subtract 128 from each pixel).
Transformation
1. Apply DCT to the image tile by tile.
2. The pushes high-intensity values to the top left corner.
3. The top-left is the DC component (average value of the pixels).
4. The bottom-right corner is the AC component (has the highest frequency).
Quantisation
1. Round elements towards zero (like floor).
  
  This removes high frequency components from the image and provides the lossy compression component.
Encoding
1. Reorganise the quantised DCT coefficients into a vector.
2. Oder the coefficients in a zig-zag order.
3. Apply Huffman coding.

ELEC319 (Image Processing)

Digital Images

Image Sensing & Optics

Attenuation

IR Cameras

Low Light

Tools for Digital Image Processing

Typical Image Processing System

Terminology

Human Eye

Rods & Cones

Fovea & Saccades

Stereo Vision

Colour

Camera Resolution

Johnson Criteria

Optical Effects

Binary Images

Grayscale Images

Colour Images

Colour Quantisation

Conversion

Colour Image to Grayscale Conversion

Grayscale to Binary Image Conversion

Arithmetic Operations

Affine Transformations

Combined Transforms

3D Transformations

Pixels, Neighbours & Connectivity

Structuring Element

Structuring Element Uses

Distance Between Two Pixels

Distance Transform

Connected Component Labelling

Two-Pass Equivalence Class Resolution

Shapes

Geometrical Features

Morphological Erosion

Morphological Dilation

Other Morphological Operations

Morphological Opening

Morphological Closing

Morphological Boundary Extraction

Extracting Connected Components

Region Filling

Hit-and-Miss Transform

Skeletonisation

Top-Hat

Bottom-Hat

Filtering

Linear Filtering

Non-Linear Filtering

Mean Filtering

Linear Filtering, Cross-Correlation & Convolution

Median Filtering

Gaussian Filtering

Gain-Offset Correction

Scaling & Equalisation

Intensity Transforms

Contrast Stretching

Intensity Level Slicing

Bit-Plane Slicing

Histogram Stretching

Derivative Filters

Derivative of Images

Gradient Operators

Canny Edge Detection

Hough Transform

Algorithm for Detecting Straight Lines

Edges & Regions

Edge Linking

“Snakes”

Thresholding

Variance Growth

Statistical Texture

Watershed Algorithm

Colour Segmentation

Motion Segmentation

Region Labelling

Noise in Segmentation