|
|
||
|
When you mix features on different scales in one dataset (for example Age in years and Income in dollars), a naive Euclidean distance between two points is dominated by the axis with the larger range. Algorithms like K-Nearest Neighbours or K-Means rely on that distance, so without scaling, one feature can effectively “drown out” the other. This tool shows how that distortion looks and how two common scaling methods fix it. Why scale?Suppose Feature A (Age) ranges from 0 to 100 and Feature B (Income) from 20,000 to 500,000. A difference of 1 year and a difference of 1 dollar are treated the same in the formula d = √(ΔA² + ΔB²), so a small change in income can outweigh a large change in age. Visually, the plot is stretched along the income axis and distances look skewed. After scaling, both axes contribute on an equal footing. Min-Max normalizationMap each feature into [0, 1] using the observed min and max: x̅i = (xi − min) / (max − min) All points lie in a unit square; the aspect ratio is equal and distances are balanced. The drawback: a single extreme outlier (e.g. one very high income) makes max huge, so every other point gets squashed toward 0 on that axis and the rest of the variation becomes hard to see. Standardization (Z-score)Center each feature at 0 and scale by its standard deviation (sample formula with n − 1): zi = (xi − μ) / σ, σ = √(∑j(xj − μ)² / (n − 1)) Values are in “number of standard deviations from the mean.” On the plot, the origin is the mean; concentric circles at 1σ, 2σ, 3σ show how spread the data is. Outliers sit far from the center but the bulk of the points keep a readable, centered layout, so standardization is often more robust than min-max when outliers are present. Euclidean distanceFor two points p = (p1, p2) and q = (q1, q2), the Euclidean distance is d = √((p1 − q1)² + (p2 − q2)²). In Original mode you use raw Age and Income, so the number is in mixed units. In Min-Max or Standardization mode you use the transformed coordinates, so the distance is dimensionless and comparable. Click two points in the simulator to see how the same pair’s distance value changes with the chosen mode. Feature space (data view)
(Original scale only)
Algorithm perspective
Data point
Outlier
UsageData preset: At the top of the control panel, choose a dataset from Extreme (Age vs Income) through Strong, Moderate, Mild, to Similar scales (0–100). Each preset uses different Age and Income ranges so you can see how normalization behaves when axis ranges differ a lot (extreme) versus when they are comparable (mild/similar). Transformation mode: Switch between Original (Unscaled), Min-Max [0, 1], and Standardization (Z-score). The points animate to their new positions. In Original mode the axes use raw ranges (aspect ratio can be distorted); use X–Y same scale (top-left of the first canvas) to show equal pixels per unit on both axes. In Min-Max mode both features are in [0, 1]; in Z-score the origin is the mean and the grid shows 1, 2, 3 standard deviations. Distance: Click one point, then another, to see the Euclidean distance in the current coordinate system (on the line and in the sidebar). In Original mode the distance is in mixed units; in Min-Max or Z-score it is dimensionless. Add extreme outlier: Inserts a “Millionaire Toddler” (Age 2, Income $5,000,000), drawn in red. In Min-Max mode the new max income squashes other points; in Standardization the outlier sits far from the center but the rest keep a readable spread. Reset data: Regenerates 20 random points using the current data preset and clears the outlier and selection. Algorithm perspective: The second canvas shows distance-based behavior. 3-Nearest Neighbors: select one point to see neighbors (green lines). Distance heatmap: select one point to see the distance field (options: distance contrast, true scale). Learning path (gradient descent): select one point to see the path toward the minimum (learning rate slider). Attention weights (Softmax): select one point to see softmax attention from that query; use the Temperature slider to sharpen (low) or soften (high) the distribution. Subplots at the top-left of each canvas compare Original (saturated) vs Normalized (balanced) attention. Key insightIn Min-Max mode, a single extreme value “steals” the scale so that the rest of the data loses resolution. In Standardization, the scale is set by the standard deviation of the group, so the majority of points stay well separated and the outlier is simply far from the center.
|
||