This interactive tutorial demonstrates Clustering Algorithms, a fundamental family of unsupervised machine learning techniques used to partition data into meaningful groups. You'll explore and compare 7 different clustering algorithms side by side, observing how they behave differently on the same dataset.
The simulation visualizes K-Means, K-Medoids, Mini-batch K-Means, K-Modes, K-Prototypes, DBSCAN, and Mean Shift—each designed for different data types and use cases. Watch how centroids move, clusters form, and algorithms converge!
The Algorithms
Each algorithm uses a different approach to clustering:
- K-Means: Classic algorithm using Euclidean distance with mean-based centroids
- K-Medoids: Uses actual data points as cluster centers (more robust to outliers)
- Mini-batch K-Means: Faster variant using random samples per iteration
- K-Modes: Designed for categorical data using Hamming distance
- K-Prototypes: Handles mixed numerical and categorical data
- DBSCAN: Density-based algorithm that discovers clusters and detects outliers (no K required)
- Mean Shift: Non-parametric algorithm that finds clusters by seeking density peaks (no K required)
The Clustering Problem
Given a set of observations, partition them into K clusters such that points within the same cluster are more similar to each other than to points in other clusters:
Minimize: J = Σᵢ₌₁ᴷ Σₓ∈Cᵢ d(x, μᵢ)²
where:
- K is the number of clusters
- Cᵢ is the set of points in cluster i
- μᵢ is the centroid (center) of cluster i
- d(x, μᵢ) is the distance from point x to centroid μᵢ
Usage Guide
Follow these steps to explore the clustering algorithms:
-
Select an Algorithm: Choose from the dropdown to see how different algorithms behave:
- K-Means: Watch centroids move to cluster means
- K-Medoids: Centroids snap to actual data points
- Mini-batch: Faster but jittery convergence
- K-Modes: Points snap to categorical grid
- K-Prototypes: Combines both position and shape/color
- DBSCAN: Watch ε-neighborhoods expand and discover clusters (Auto K)
- Mean Shift: See centroids shift toward density peaks and merge (Auto K)
-
Generate Data: Choose a data pattern and click "Generate":
- Gaussian Blobs: Well-separated clusters (best for K-Means)
- Moons: Two interleaved crescents (best for DBSCAN)
- Spiral: Intertwined spirals (best for DBSCAN)
- Varied Density: Clusters with different densities (test Mean Shift)
- Overlapping: Clusters that overlap (challenging case)
-
Algorithm-Specific Parameters:
- K (Clusters): For K-Means family - how many clusters to create
- ε (Radius): For DBSCAN - neighborhood radius (larger = more connected)
- MinPts: For DBSCAN - minimum neighbors to be a core point
- Bandwidth: For Mean Shift - kernel window size
-
Control Execution:
- Step Fwd: Execute one iteration forward
- Step Bwd: Go back to previous iteration
- Run: Auto-run until convergence
- Reset: Reinitialize centroids
-
Observe Convergence: Watch the WCSS (Within-Cluster Sum of Squares) decrease and the status change to "Converged" when the algorithm stabilizes.
Key Insight: Different algorithms excel in different scenarios. K-Means is fast but requires K. DBSCAN and Mean Shift automatically discover the number of clusters and can find non-spherical shapes. K-Modes handles categorical data that K-Means cannot.
Algorithm Details
K-Means (Lloyd's Algorithm)
The most popular clustering algorithm. It works by:
- Initialize: Place K centroids (using K-means++ for better initialization)
- Assign: Assign each point to the nearest centroid (Euclidean distance)
- Update: Move each centroid to the mean of its assigned points
- Repeat: Until assignments don't change (convergence)
μᵢ = (1/|Cᵢ|) Σₓ∈Cᵢ x
Data Structure: Point-Centric Assignment
Rather than centroids maintaining lists of their assigned points, each point stores its own cluster index. This is more efficient because reassignment only requires updating one integer.
Points Array
// Each point knows which cluster it belongs to:
| P0 { x: 12, y: 45, clusterIndex: 0 } |
→ |
Cluster 0 |
| P1 { x: 89, y: 23, clusterIndex: 2 } |
→ |
Cluster 2 |
| P2 { x: 34, y: 67, clusterIndex: 1 } |
→ |
Cluster 1 |
| P3 { x: 56, y: 12, clusterIndex: 0 } |
→ |
Cluster 0 |
| P4 { x: 78, y: 89, clusterIndex: 1 } |
→ |
Cluster 1 |
| ... |
|
|
To get all points for Centroid 0:
points.filter(p => p.clusterIndex === 0)
→
[P0, P3, ...]
To reassign P1 from Cluster 2 to Cluster 0:
P1.clusterIndex = 0;
// That's it! No list manipulation
Why This Matters: During the Update phase, the algorithm filters points by their clusterIndex to compute the new centroid position as the mean of all assigned points.
Centroid Update: How Position Changes
After all points are assigned, each centroid moves to the center of mass of its assigned points:
Centroid Update Process
// K-Means: new position = mean of assigned points
c.x = clusterPoints.reduce((sum, p) => sum + p.x, 0) / clusterPoints.length;
c.y = clusterPoints.reduce((sum, p) => sum + p.y, 0) / clusterPoints.length;
Different algorithms update centroids differently:
| Algorithm |
Update Rule |
Result |
| K-Means |
Mean of assigned points |
Virtual point (may not be a data point) |
| K-Medoids |
Point that minimizes total distance |
Always an actual data point |
| Mini-batch |
Incremental: μ ← μ(1-η) + x·η |
Gradually moves toward samples |
| K-Modes |
Mode (most frequent) per attribute |
Most common category combination |
Convergence: The algorithm stops when no point changes its cluster assignment AND centroids stop moving.
K-Medoids (PAM - Partitioning Around Medoids)
Instead of using the mean, uses actual data points as cluster centers:
- Medoid: The most centrally located point in a cluster
- Advantage: More robust to outliers and noise
- Disadvantage: Computationally more expensive (O(n²))
mᵢ = argminₓ∈Cᵢ Σᵧ∈Cᵢ d(x, y)
Mini-Batch K-Means
A faster approximation of K-Means for large datasets:
- Uses small random samples (mini-batches) per iteration
- Updates centroids using online learning
- Trades accuracy for speed (useful for millions of points)
μᵢ ← μᵢ(1 - η) + x·η where η = 1/count
K-Modes (Categorical Data)
Designed specifically for categorical attributes:
- Distance: Hamming distance (count of attribute mismatches)
- Centroid: Mode (most frequent value) for each attribute
- Use case: Customer segmentation, survey data
d(x, y) = Σⱼ (xⱼ � yⱼ) = number of mismatches
K-Prototypes (Mixed Data)
Combines K-Means and K-Modes for mixed numerical/categorical data:
- Numerical: Uses Euclidean distance and mean
- Categorical: Uses Hamming distance and mode
- γ parameter: Controls the weight between numerical and categorical
d = (1-γ)·d_euclidean + γ·d_hamming
DBSCAN (Density-Based Spatial Clustering)
A density-based algorithm that doesn't require specifying K in advance:
- ε (Epsilon): Radius of the neighborhood around each point
- MinPts: Minimum points required to form a dense region
- Core Point: Has at least MinPts neighbors within ε
- Border Point: Within ε of a core point but has fewer than MinPts neighbors
- Noise Point: Not within ε of any core point (outlier)
Nε(p) = {q ∈ D | dist(p, q) ≤ ε}
The algorithm works by:
- Visit: Pick an unvisited point and find its ε-neighborhood
- Check: If |Nε(p)| ≥ MinPts, it's a core point → start new cluster
- Expand: Add all density-reachable points to the cluster
- Noise: Points not reachable from any core point are marked as noise
Advantage: DBSCAN can find arbitrarily shaped clusters and automatically identifies outliers!
DBSCAN Point Classification
Core Point
≥ MinPts neighbors
Border Point
< MinPts, near core
Noise Point
Isolated outlier
ε-neighborhood (dashed circle) defines which points are "neighbors"
Mean Shift (Mode-Seeking Algorithm)
A non-parametric algorithm that finds cluster centers by seeking density peaks:
- Bandwidth: Size of the kernel window (similar to ε in DBSCAN)
- Kernel: Gaussian kernel to weight nearby points
- Shift: Move centroid toward weighted mean of nearby points
- Merge: Combine centroids that converge to the same peak
m(x) = Σᵢ K(xᵢ - x) · xᵢ / Σᵢ K(xᵢ - x)
The algorithm works by:
- Initialize: Place seed centroids at data points
- Shift: Move each centroid toward weighted mean of points within bandwidth
- Iterate: Repeat until centroids stop moving (convergence)
- Merge: Combine centroids that are closer than bandwidth/2
- Assign: Assign each point to nearest final centroid
Advantage: Mean Shift automatically determines the number of clusters based on data density!
Mean Shift: Centroid Migration
Centroid shifts toward weighted mean of points within bandwidth
Points closer to centroid have higher weight (Gaussian kernel)
Distance Metrics
| Metric |
Formula |
Data Type |
Used By |
| Euclidean |
√(Σ(xᵢ - yᵢ)²) |
Numerical |
K-Means, K-Medoids, Mini-batch, DBSCAN, Mean Shift |
| Hamming |
Σ(xᵢ � yᵢ) |
Categorical |
K-Modes |
| Mixed |
(1-γ)·Euclidean + γ·Hamming |
Mixed |
K-Prototypes |
Visual Elements
Understanding the visualization:
- Point Shape: Represents categorical attribute 1 (circle, square, triangle, diamond)
- Point Border Color: Represents categorical attribute 2
- Point Fill Color: Shows cluster assignment
- Cross Markers: Centroids (cluster centers)
- Grid Lines: Category boundaries (shown in K-Modes)
Statistics Explained
- Iterations: Number of algorithm steps executed
- Status: "Running" during execution, "Converged" when stable
- WCSS: Within-Cluster Sum of Squares (lower is better)
- Cluster Sizes: Number of points in each cluster
Algorithm Comparison
| Algorithm |
Speed |
K Required? |
Cluster Shape |
Best For |
| K-Means |
Fast |
Yes |
Spherical |
General purpose |
| K-Medoids |
Slow |
Yes |
Spherical |
Noisy data with outliers |
| Mini-batch |
Very Fast |
Yes |
Spherical |
Large datasets |
| K-Modes |
Fast |
Yes |
- |
Categorical data |
| K-Prototypes |
Medium |
Yes |
Mixed |
Mixed numerical/categorical |
| DBSCAN |
Medium |
No |
Arbitrary |
Noisy data, outlier detection |
| Mean Shift |
Slow |
No |
Arbitrary |
Unknown cluster count |
K-Required vs Auto-K Algorithms
When should you use which type?
| Scenario |
Recommended |
Why |
| You know the number of clusters |
K-Means |
Fast, simple, deterministic cluster count |
| Clusters have arbitrary shapes |
DBSCAN |
Can find non-convex clusters (moons, spirals) |
| Data has outliers/noise |
DBSCAN |
Automatically identifies noise points |
| Unknown cluster count |
Mean Shift |
Discovers clusters based on density peaks |
| Clusters have varying density |
Mean Shift |
Handles density variations better than DBSCAN |
| Need interpretable centroids |
K-Medoids |
Centroids are actual data points |
| Categorical data |
K-Modes |
Uses Hamming distance, not Euclidean |
Limitations and Considerations
- K Selection: K-Means family requires choosing K beforehand. Use DBSCAN or Mean Shift to automatically discover the number of clusters.
- Initialization Sensitivity: K-Means results depend on initial centroid placement. We use random placement; K-means++ can improve this.
- Cluster Shape: K-Means assumes spherical clusters. Use DBSCAN or Mean Shift for arbitrary shapes.
- Scale Sensitivity: Features should be normalized for Euclidean distance to work properly.
- Parameter Tuning: DBSCAN requires good ε and MinPts values; Mean Shift needs appropriate bandwidth.
- Local Minima: Algorithms may converge to local optima, not global.
Real-World Applications
- Customer Segmentation: Group customers by purchasing behavior
- Image Compression: Reduce colors by clustering pixels
- Anomaly Detection: Points far from any centroid may be outliers
- Document Clustering: Group similar documents/articles
- Gene Expression Analysis: Cluster genes with similar expression patterns
Clustering in the Deep Learning Era
Are these "classical" algorithms still relevant? Yes! Here's why:
| Use Case |
Why Classical Clustering |
| Vector Quantization |
K-Means used in VQ-VAE, codebook learning for image/audio |
| Embedding Preprocessing |
Cluster embeddings before feeding to neural networks |
| Interpretability |
Explainable results for medical, finance, legal applications |
| Edge/IoT Devices |
Low compute requirements vs. neural networks |
| Quick Prototyping |
Fast implementation, no training data needed |
Modern approaches often combine both:
- Deep Embedded Clustering (DEC): Autoencoders + K-Means jointly
- SCAN: Contrastive learning + clustering
- SwAV: Self-supervised learning using online K-Means
- Vector Databases (Pinecone, Milvus): K-Means/HNSW for similarity search
Bottom Line: K-Means is to clustering what linear regression is to prediction—fast, interpretable, and "good enough" for most real-world problems. Deep learning clustering is powerful but often overkill unless dealing with raw images/audio/text that need representation learning first.