Clustering in Machine Learning: Hierarchical, Density and and based

Zahra Ahmad
4 min readJul 24, 2022

In this article, I will present the most common algorithms and methods to perform clustering on data.

Photo by Pierre Bamin on Unsplash

Before we begin, let’s first explain what is clustering and why we do cluster analysis in machine learning.

What is Cluster Analysis?

In its most intuitive definition, cluster analysis (or clustering) is the unsupervised task of finding a set of groups (or clusters) in a dataset, so that objects belonging to the same group are similar and objects belonging to different groups are different according to some similarity measure.

The definitions of this similarity measure, as well as those of what constitutes a cluster, are many and have given rise to numerous algorithms.

Here we’ll present an overview of some that we’ve taken into consideration as preprocessing step for our task.

Hierarchical Clustering

Hierarchical Cluster Analysis (HCA) is a greedy approach to clustering based on the idea that observation points spatially closer are more likely related than points spatially farther away.

A distance matrix between each point in the dataset is computed, based on a chosen distance metric (the most common are Euclidean and…

--

--

Zahra Ahmad

MSc in Data Science, I love to extract the hell out of any raw data, sexy plots and figures are my coffee