K-Means Clustering in Image Compression

Machine Learning is used all around us from Netflix picking out the next Friday Night Movie to banks looking for fraudulent activity, but what is machine learning?

Learning is turning experience into knowledge

Therefore machine learning is giving the computer the ability to turn experience into knowledge. Two examples of learning types in machine learning are supervised and unsupervised learning.

Supervised learning is when the machine’s training data is labelled, therefore every input has a corresponding output/label. The opposite of supervised learning is unsupervised learning. Unsupervised learning is when the machine’s training data has no labels; k-Means clustering is an example of an unsupervised learning algorithm.

Cluster analysis divides the data into groups (clusters), such that the objects in a cluster are more similar to each other than the objects in the other clusters

(Wu, J. & SpringerLink, 2012)

The objective of the k-means clustering algorithm is to find the best way to divide the data into k clusters which do not overlap. These clusters are represented by their centroids. (Wu, J. & SpringerLink, 2012)

A centroid is the mean of all the data points in that particular cluster

Example of Pseudo-Code for the k-Means Clustering Algorithm

Example of Pseudo-Code for the k-Means Clustering Algorithm

The algorithm works by defining the centroids within the data, which can be done by randomly choosing k different centroids. Each point is then assigned to a centroid depending on the Euclidean distance between each point and centroid. After each data point is given a centroid, the centroids are re-calculated so that it is the mean of all the data points which were assigned to that particular centroid. Then the algorithm re-assigns the data points and re-calculates the centroids until the data points cannot change clusters as the optimal solution is found. (Piech, 2013). k-Means clustering is an iterative algorithm which therefore implies that there is an optimal solution.

The placement of each centroid is iterative so that the machine has to find the place which best describes the training data. This show how the centroids and the data points continue to change until the optimal solution is found. (Piech, 2013)

The placement of each centroid is iterative so that the machine has to find the place which best describes the training data. This show how the centroids and the data points continue to change until the optimal solution is found. (Piech, 2013)

When using this algorithm for image compression there are 2 aspects to consider.

  • The physical location of the pixel (2 Dimensions)

  • The colour of each pixel (3 Dimensions)

The colour of each pixel is in 3 dimensions as every colour is a multiple of Red, Green and Blue (RGB). Example of using k-means clustering on image compression and the R code used can be seen below.

Overall, k-means clustering training data clusters and adapts with every iteration of the algorithm. The k-means clustering technique is easy to understand and apply to reduce image file size for an individual image, however it might not be the solution.

# Load Libraries
library(jpeg)
library(ggplot2)
library(grid)
library(gridExtra)

# Set Working Directory
setwd("")

# Load Image
image.path <- paste(getwd(), "/Image.jpg", sep = "")
img <- readJPEG(image.path) # Read the Image

# Image Dimension
imgDM <- dim(img)

# Assign RGB Channels to data frame
imgRGB <- data.frame(
x =rep(1:imgDm[2],each=imgDm[1]),
y =rep(imgDm[1]:1,imgDm[2]),
R =as.vector(img[,,1]),
G =as.vector(img[,,2]),
B =as.vector(img[,,3])
)

# Obtain a plot of the original image
plot1 <- ggplot(data=imgRGB,aes(x=x,y=y))+
geom point(colour=rgb(imgRGB[c(`R',`G',`B')]))+
labs( t itle = `OriginalImage:AshtonMemorial')+
xlab(`x')+
ylab(`y')

# KMeans
kClusters <- 3 #NumberofCentroids
kMeans <- kmeans(imgRGB[,c(`R',`G',`B')],centers=kClusters)
kColours <- rgb( kMeans$ c enters[kMeans$ c luster,])

# Obtain a plot of the compressed image
plot2 <- ggplot(data = imgRGB,aes(x=x,y=y))+
geom point(colour=kColours)+
labs( t itle = paste ( `k-MeansClusteringof',kClusters,`Colours'))+
xlab(`x')+
ylab(`y')

Piech, C. (2013). K Means. [online] Stanford.edu. Available at: http://stanford.edu/~cpiech/cs221/handouts/kmeans.html [Accessed 5 Nov. 2018.]

Wu, J. & SpringerLink, (2012). Advances in K-means Clustering [Electronic Resource]: A Data Mining Thinking, Berlin, Heidelberg: Springer Berlin Heidelberg: Imprint: Springer.