Web Simulation 

 

 

 

 

Word Embedding Visualization via PCA

This interactive tutorial visualizes word embeddings in 2D using Principal Component Analysis (PCA). Word embeddings are high-dimensional vectors (512D from the Universal Sentence Encoder) that capture semantic meaning; similar words sit close in vector space. PCA reduces 512D to 2D so we can see semantic clusters and explore vector analogies (e.g., King - Man + Woman ≈ Queen).

Pipeline: Enter words (comma-separated); USE produces 512D vectors. You can either Run PCA Projection (2D or 3D) or Manual projection onto chosen axes (0–511). A grid canvas shows points with hover tooltips. The Vector Analogy panel computes A - B + C in 512D, projects into the same space (PCA or manual), and highlights the nearest word (cosine similarity).

Same basis: the projection (PCA or manual axes) is stored so the analogy result uses the same basis as the plotted words. Press Enter in the input box to run PCA; change Manual axis spin boxes to auto-apply manual projection.

Mathematical model

Embeddings: each word is mapped to 512D by the Universal Sentence Encoder. PCA via SVD: the embedding matrix X is centered to Xc, decomposed, and projected onto the first k right singular vectors (the same mean and V are reused for the analogy vector):

Xc = U S VT  →   Y = Xc × V[:, 0:k]

Manual projection: pick axis indices (0–511); each point is simply [vec[axis0], vec[axis1]] (or three axes for 3D). Vector analogy: compute the result in 512D, project it with the stored basis, and highlight the nearest word by cosine similarity:

result_vec = embed(A) − embed(B) + embed(C)

Simulation

The interactive simulator is below. Use the controls to explore the concepts described above.

Input words (comma separated):
Preset:
PCA: Status: Waiting for model...
Manual: Axis:
Initializing AI models...
If this hangs, open this page via http (e.g. local server), not file://.

Vector Analogy (A - B + C)

Explore semantic relationships (e.g. King - Man + Woman ≈ Queen)

Analogy preset:
- +
Result ≈ --
The arrow from B to A represents a "direction" in meaning; adding it to C often points toward the analogous word. PCA or manual projection puts the 512D result into this 2D/3D view.

512D word vectors (before PCA)

dim

 

Usage Example

Follow these steps to explore word embeddings and PCA:

  1. Load: Open the page; the Universal Sentence Encoder loads (status shows "Model ready").
  2. Input words: Enter comma-separated words in the text area, or choose a Preset. Press Enter to run PCA projection immediately.
  3. Run PCA Projection: Choose PCA 2D or 3D and click "Run PCA Projection". The app embeds all words (512D), runs PCA (SVD) to 2D or 3D, and draws points. Hover for word and coordinates. In 3D, use the rotate button and drag on the canvas to rotate the view.
  4. Manual projection: Choose Manual 2D or 3D, set axis indices (0–511), and click "Run Projection". Changing a spin box value also runs the projection automatically.
  5. Vector Analogy: Set A, B, C (e.g. King, Man, Woman). Click "Calculate Path". The app computes A - B + C in 512D, projects into the current space (PCA or manual), finds the nearest word, and highlights it; the orange point shows where the result lands.
  6. Interpret: Words that are close in meaning tend to cluster. The analogy result (e.g. Queen) is highlighted in gold.
Try this: run PCA or Manual projection first with a list that includes the analogy words. Then use the Vector Analogy panel so the result lands in the same 2D/3D space and the nearest neighbor is among your points.

Parameters

  • Input words: Comma-separated words or short phrases. Each is embedded to 512D. PCA reduces to 2D/3D; Manual uses selected axes (0–511). Press Enter to run PCA.
  • Preset: Dropdown to fill the word list (e.g. All analogy words, Semantic cluster, Royalty, Fruits, Planets).
  • PCA: 2D or 3D dropdown; Run PCA Projection runs SVD and draws the plot. Status shows which original dimensions have max weight in each component.
  • Manual: 2D/3D dropdown, Run Projection button, and axis spin boxes (0–511). Changing a spin box auto-runs the projection.
  • Vector Analogy A, B, C: Three words for A - B + C. Result is computed in 512D, projected into the current space (PCA or manual), and the nearest word is shown and highlighted.

Controls and Visualizations

  • Input words textarea: Comma-separated list; Preset fills it. Enter runs PCA; Run PCA Projection or Manual Run Projection draws the canvas.
  • 2D/3D Canvas: Grid with points (one per word). Hover shows word and coordinates. Canvas controls: Home, Zoom, Pan; in 3D, rotate button and drag to rotate. After "Calculate Path", nearest word is highlighted in gold and the A-B+C point is orange.
  • Vector Analogy and Calculate Path: A, B, C fields; button computes A - B + C in 512D, projects into the current space, finds nearest word, and updates the result and highlight.

Key Concepts

  • Word embeddings: Dense vectors (here 512D from Universal Sentence Encoder) that capture meaning; similar words have similar vectors.
  • PCA (SVD): Centers the embedding matrix, runs SVD, and projects onto the first two or three right singular vectors for 2D/3D while preserving maximum variance.
  • Manual projection: Uses chosen dimensions (0–511) from the 512D embedding directly as coordinates.
  • Vector analogy: A - B + C in embedding space often points toward the "analogous" word (e.g. King - Man + Woman ≈ Queen). The direction from B to A can be interpreted as a relation; adding it to C yields the analogous concept.
Consistency: the same projection (PCA or manual axes) is used for the main word list and the analogy result, so A−B+C appears in the correct position relative to the other points in 2D/3D.

Limitations

  • Drastic dimensionality reduction. PCA squeezes 512D embeddings into 2D/3D, discarding most of the variance. Clusters and analogies that are clean in 512D can look distorted or overlap in the projection — absence of a visible cluster does not mean absence of structure.
  • Projection depends on the word list. PCA axes are computed from the current set of words, so adding/removing words changes the plot and the analogy result. There is no fixed, global 2D map of language.
  • Analogies are approximate. A−B+C ≈ nearest word is a well-known heuristic that often fails; the result is the nearest word in your list, and excluding the target word or a poor list easily breaks it.
  • Manual axes are arbitrary. Individual embedding dimensions (0–511) are not independently interpretable, so manual projection onto raw axes rarely yields meaningful geometry.
  • Model and corpus bias. The Universal Sentence Encoder reflects its training data, including social biases; "semantic" proximity here is the model's, not ground truth.
  • Small, hand-typed vocabulary. Designed for a handful of words/phrases for visualization; it is not a tool for large-scale embedding analysis or quantitative similarity benchmarking.