Machine Learning is used all around us from Netflix picking out the next Friday Night Movie to banks looking for fraudulent activity, but what is machine learning?
Learning is turning experience into knowledge
Therefore machine learning is giving the computer the ability to turn experience into knowledge. Two examples of learning types in machine learning are supervised and unsupervised learning.
Supervised learning is when the machine’s training data is labelled, therefore every input has a corresponding output/label. The opposite of supervised learning is unsupervised learning. Unsupervised learning is when the machine’s training data has no labels; k-Means clustering is an example of an unsupervised learning algorithm.
Cluster analysis divides the data into groups (clusters), such that the objects in a cluster are more similar to each other than the objects in the other clusters
(Wu, J. & SpringerLink, 2012)
The objective of the k-means clustering algorithm is to find the best way to divide the data into k clusters which do not overlap. These clusters are represented by their centroids. (Wu, J. & SpringerLink, 2012)
A centroid is the mean of all the data points in that particular cluster
The algorithm works by defining the centroids within the data, which can be done by randomly choosing k different centroids. Each point is then assigned to a centroid depending on the Euclidean distance between each point and centroid. After each data point is given a centroid, the centroids are re-calculated so that it is the mean of all the data points which were assigned to that particular centroid. Then the algorithm re-assigns the data points and re-calculates the centroids until the data points cannot change clusters as the optimal solution is found. (Piech, 2013). k-Means clustering is an iterative algorithm which therefore implies that there is an optimal solution.
When using this algorithm for image compression there are 2 aspects to consider.
The physical location of the pixel (2 Dimensions)
The colour of each pixel (3 Dimensions)
The colour of each pixel is in 3 dimensions as every colour is a multiple of Red, Green and Blue (RGB). Example of using k-means clustering on image compression and the R code used can be seen below.
Overall, k-means clustering training data clusters and adapts with every iteration of the algorithm. The k-means clustering technique is easy to understand and apply to reduce image file size for an individual image, however it might not be the solution.
# Load Libraries library(jpeg) library(ggplot2) library(grid) library(gridExtra) # Set Working Directory setwd("") # Load Image image.path <- paste(getwd(), "/Image.jpg", sep = "") img <- readJPEG(image.path) # Read the Image # Image Dimension imgDM <- dim(img) # Assign RGB Channels to data frame imgRGB <- data.frame( x =rep(1:imgDm,each=imgDm), y =rep(imgDm:1,imgDm), R =as.vector(img[,,1]), G =as.vector(img[,,2]), B =as.vector(img[,,3]) ) # Obtain a plot of the original image plot1 <- ggplot(data=imgRGB,aes(x=x,y=y))+ geom point(colour=rgb(imgRGB[c(`R',`G',`B')]))+ labs( t itle = `OriginalImage:AshtonMemorial')+ xlab(`x')+ ylab(`y') # KMeans kClusters <- 3 #NumberofCentroids kMeans <- kmeans(imgRGB[,c(`R',`G',`B')],centers=kClusters) kColours <- rgb( kMeans$ c enters[kMeans$ c luster,]) # Obtain a plot of the compressed image plot2 <- ggplot(data = imgRGB,aes(x=x,y=y))+ geom point(colour=kColours)+ labs( t itle = paste ( `k-MeansClusteringof',kClusters,`Colours'))+ xlab(`x')+ ylab(`y')
Piech, C. (2013). K Means. [online] Stanford.edu. Available at: http://stanford.edu/~cpiech/cs221/handouts/kmeans.html [Accessed 5 Nov. 2018.]
Wu, J. & SpringerLink, (2012). Advances in K-means Clustering [Electronic Resource]: A Data Mining Thinking, Berlin, Heidelberg: Springer Berlin Heidelberg: Imprint: Springer.