By: Ian Makgill
In order to glean information from the colour study, significant data analysis was required. All of the data for the study was provided in a single file. Colour values were expressed as RGB and Hex values alongside data including the user’s reason or affinity for their choice. In total, 26,596 individual submissions were gathered in a four month period between January and April 2017. In order to then establish the most popular colour from the data, it was necessary not only to determine the exact colour with the highest number of votes, but also to establish an understanding of that mean colour. To do this we used the K-Means clustering algorithm.
The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of the K groups based on the features that are provided. Data points are clustered based on feature similarity. The results of the K-Means clustering algorithm gave us the centroids of the K clusters, which can be used to label new data.
Rather than defining groups before looking at the data, clustering allowed us to find and analyse the groups that have formed organically. Each centroid of a cluster was a collection of feature values, which define the resulting groups. Examining the centroid feature weights can be used to qualitatively interpret what kind of group each cluster represents.
To effect the K-Means clustering analysis, we first created an image with a ‘DNA chart’ of colours, with each vertical stripe representing one of the colours in the data.
This image was then fed to the K-Means clustering algorithm, so that it could group the colours according to the Euclidean distance between individual colours, finally calculating the centroid point for each of the clusters and outputting an image that provided a ranked list of grouped colours by popularity.
To read the full report contact G . F Smith to order a physical copy.
Ian Makgill is the Founder and Analyst of Spend Network, a company that specialises in combining complex and inconsistent data sets, cleansing and linking the data.