2. Transform Domain in Image Processing (using Python)
When working with images, many important details are found in high-frequency areas, such as sharp changes in brightness or texture. These details can be analyzed more effectively by moving from the spatial domain (where images are represented as pixels) to the frequency domain. To do this, we use transforms, which allow us to analyze and manipulate the frequency components of an image.
What is a Transform?
A transform is a mathematical function that converts data from one domain to another. In the context of images:
- A 1-D signal (such as sound) can be expressed as a combination of orthonormal basis functions.
- A 2-D signal (such as an image) can be expressed as a combination of basis matrices, often called basis images.
This process allows us to represent an image as a combination of different frequency components, making it easier to manipulate or analyze specific details.
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
im = cv2.imread('lena_color.jpg', 0) # Load image in grayscale
im1 = np.float32(im) / 255.0 # Normalize the image to [0,1]
Fourier Transform
The Fourier Transform is a powerful tool that decomposes a signal (such as an image) into its frequency components. In images, low frequencies (smooth areas) are concentrated in the center, while high frequencies (edges, textures) are spread outwards.
Using the Fourier Transform, we can examine which parts of the image correspond to different frequency ranges. This can be particularly useful when we want to filter out certain details or embed watermarks in areas that are less noticeable to the human eye.
Visualizing the Fourier Transform
# Apply Fourier Transform and shift zero frequency to the center
transform = np.fft.fftshift(np.fft.fft2(im))
# Show original and transformed images
plt.figure(figsize=(15, 6))
plt.subplot(121)
plt.title('Original')
plt.imshow(im, cmap='gray')
plt.subplot(122)
plt.title('Fourier Transform')
plt.imshow(np.log(abs(transform)), cmap='gray') # Log scale to visualize frequencies
plt.show()
The center contains the low frequencies (DC component), which represent the average brightness of the image. These components dominate the signal, which is why they appear so bright.
Inverse Fourier Transform
If we want to reconstruct the image after processing it in the frequency domain, we can use the Inverse Fourier Transform:
# Apply Inverse Fourier Transform
transform = np.fft.fft2(im)
restored_image = np.real(np.fft.ifft2(transform))
plt.imshow(restored_image, cmap='gray')
plt.title('Restored Image')
plt.show()
Discrete Cosine Transform (DCT)
The Discrete Cosine Transform (DCT) expresses an image as a sum of cosine functions oscillating at different frequencies. The DCT is widely used in image compression (e.g., JPEG), as it allows for the efficient representation of important details while discarding unnecessary information.
The reason we use cosine instead of sine is because cosine functions are more efficient at approximating real-world signals with fewer components.
Properties of DCT:
- It is real and orthogonal, meaning the inverse is simply the transpose.
- It is fast and efficient for data compression.
- It allows us to locate areas in an image where watermarks can be embedded without affecting visual quality.
from scipy.fft import dct, idct
# Apply DCT in both directions (2D DCT)
transform = dct(dct(im, axis=0, norm='ortho'), axis=1, norm='ortho')
# Show original and transformed images
plt.figure(figsize=(15, 6))
plt.subplot(121)
plt.title('Original')
plt.imshow(im, cmap='gray')
plt.subplot(122)
plt.title('DCT Transform')
plt.imshow(np.log(abs(transform)), cmap='gray') # Log scale to enhance visualization
plt.show()
DCT forms the backbone of JPEG compression, allowing images to be efficiently stored and transmitted by focusing on the most important details and discarding less noticeable information.
Inverse DCT
To reconstruct the image from the DCT coefficients, we can apply the Inverse DCT:
# Apply inverse DCT in both directions
idct2 = np.uint8(idct(idct(transform, axis=1, norm='ortho'), axis=0, norm='ortho'))
plt.imshow(idct2, cmap='gray')
plt.title('Restored Image from DCT')
plt.show()
Wavelet Transform
The Wavelet Transform is a more advanced tool that decomposes an image into multiple scales or resolutions. Unlike Fourier or DCT, wavelets allow us to analyze both high frequencies (details) and low frequencies (smooth areas) simultaneously, making it a multiresolution analysis.
In image processing, we typically use the Discrete Wavelet Transform (DWT), which uses different filters to analyze high and low frequencies at different levels. The DWT is highly effective for representing the Human Visual System (HVS), as it allows for localized frequency analysis.
Applying DWT:
import pywt
# Apply 2D Discrete Wavelet Transform using 'haar' wavelets
coeffs2 = pywt.dwt2(im, 'haar')
LL, (LH, HL, HH) = coeffs2 # LL: low frequencies, LH/HL/HH: high frequencies
# Visualize the wavelet coefficients
blank_image = np.zeros((512, 512), np.float32)
blank_image[:256, :256] = LL
blank_image[:256, 256:] = LH
blank_image[256:, :256] = HL
blank_image[256:, 256:] = HH
plt.imshow(blank_image, cmap='gray')
plt.title('Wavelet Decomposition')
plt.show()
The DWT provides multiresolution analysis, allowing us to embed watermarks in different frequency bands. This ensures that the watermark is hidden in a way that is hard to detect visually but still recoverable.
Inverse DWT
To reconstruct the image from its wavelet components, we use the Inverse DWT:
# Apply Inverse Discrete Wavelet Transform
IDWT = pywt.idwt2((LL, (LH, HL, HH)), 'haar')
plt.imshow(IDWT, cmap='gray')
plt.title('Restored Image from DWT')
plt.show()
Further Wavelet Decomposition
One interesting property of the DWT is that you can further decompose the LL (low-frequency) component, allowing for a multi-level decomposition similar to a "matryoshka" doll. This can be used for sophisticated watermarking schemes.
# Apply DWT again on the LL component
coeffs2_2 = pywt.dwt2(LL, 'haar')
LL2, (LH2, HL2, HH2) = coeffs2_2
# Visualize further decomposition
blank_image = np.zeros((512, 512), np.float32)
blank_image[:128, :128] = LL2
blank_image[:128, 128:256] = LH2
blank_image[128:256, :128] = HL2
blank_image[128:256, 128:256] = HH2
blank_image[:256, 256:] = LH
blank_image[256:, :256] = HL
blank_image[256:, 256:] = HH
plt.imshow(blank_image, cmap='gray')
plt.title('Further Wavelet Decomposition')
plt.show()
Exercises
DCT Exercise
What frequencies are the most important for DCT? Try to remove the DC component and the low and high frequencies to find it out.
from scipy.fft import dct, idct
gray_image = cv2.imread('lena_grey.bmp', cv2.IMREAD_GRAYSCALE)
transform = dct(dct(gray_image,axis=0, norm='ortho'),axis=1, norm='ortho')
# remove DC frequencies
edited1 = transform.copy()
edited1[0, 0] = 0
edited1 = np.uint8(idct(idct(edited1,axis=1, norm='ortho'),axis=0, norm='ortho'))
# remove low freqs
edited2 = transform.copy()
# 20 is an arbitrary number
# the key is that low frequences are near the left upper corner
edited2[:20, :20] = 0
edited2[0, 0] = transform[0, 0]
edited2 = np.uint8(idct(idct(edited2,axis=1, norm='ortho'),axis=0, norm='ortho'))
# remove high freqs
edited3 = transform.copy()
# 512-20 is an arbitrary number
# the key is that high frequences are near the right lower corner
edited3[512-20:, 512-20:] = 0
edited3 = np.uint8(idct(idct(edited3,axis=1, norm='ortho'),axis=0, norm='ortho'))
plt.figure(figsize=(15, 9))
plt.subplot(141)
plt.title('Original')
plt.imshow(gray_image, cmap='gray')
plt.subplot(142)
plt.title('Removed DC')
plt.imshow(edited1, cmap='gray')
plt.subplot(143)
plt.title('Removed low')
plt.imshow(edited2, cmap='gray')
plt.subplot(144)
plt.title('Removed high')
plt.imshow(edited3, cmap='gray')
plt.show()
Wavelet (DWT) Exercise
Remove each subband from an image and see what happens. Then, visualize the second level decomposition of an arbitrary frequency component (LL).
First part:
import pywt
gray_image = cv2.imread('lena_grey.bmp', cv2.IMREAD_GRAYSCALE)
coeffs2 = pywt.dwt2(gray_image, 'haar')
LL, (LH, HL, HH) = coeffs2
LL_edited = LL.copy()
LL_edited[:, :] = 0
LL_res = pywt.idwt2((LL_edited, (LH, HL, HH)), 'haar')
LH_edited = LH.copy()
LH_edited[:, :] = 0
LH_res = pywt.idwt2((LL, (LH_edited, HL, HH)), 'haar')
HL_edited = HL.copy()
HL_edited[:, :] = 0
HL_res = pywt.idwt2((LL, (LH, HL_edited, HH)), 'haar')
HH_edited = HH.copy()
HH_edited[:, :] = 0
HH_res = pywt.idwt2((LL, (LH, HL, HH_edited)), 'haar')
# Show components
plt.figure(figsize=(30, 15))
plt.subplot(151)
plt.title('Original')
plt.imshow(gray_image, cmap='gray')
plt.subplot(152)
plt.title('Removed LL')
plt.imshow(LL_res, cmap='gray')
plt.subplot(153)
plt.title('Removed LH')
plt.imshow(LH_res, cmap='gray')
plt.subplot(154)
plt.title('Removed HL')
plt.imshow(HL_res, cmap='gray')
plt.subplot(155)
plt.title('Removed HH')
plt.imshow(HH_res, cmap='gray')
plt.show()
Second part:
import pywt
coeffs2_2 = pywt.dwt2(LL, 'haar')
LL2, (LH2, HL2, HH2) = coeffs2_2
blank_image = np.zeros((512,512), np.float32)
blank_image[:128, :128] = LL2
blank_image[:128, 128:256] = LH2
blank_image[128:256, :128] = HL2
blank_image[128:256, 128:256] = HH2
blank_image[:256, 256:] = LH
blank_image[256:, :256] = HL
blank_image[256:, 256:] = HH
# Show components
plt.imshow(blank_image, cmap='gray')