Web Simulation 

 

 

 

 

Word Embedding Visualization via PCA

This interactive tutorial visualizes word embeddings in 2D using Principal Component Analysis (PCA). Word embeddings are high-dimensional vectors (512D from the Universal Sentence Encoder) that capture semantic meaning; similar words sit close in vector space. PCA reduces 512D to 2D so we can see semantic clusters and explore vector analogies (e.g., King - Man + Woman ≈ Queen).

Pipeline: Enter words (comma-separated); USE produces 512D vectors. You can either Run PCA Projection (2D or 3D) or Manual projection onto chosen axes (0–511). A grid canvas shows points with hover tooltips. The Vector Analogy panel computes A - B + C in 512D, projects into the same space (PCA or manual), and highlights the nearest word (cosine similarity).

NOTE: The projection (PCA or manual axes) is stored so the analogy result uses the same basis. Press Enter in the input box to run PCA; change Manual axis spin boxes to auto-apply manual projection.

Mathematical model

Embeddings: Each word is mapped to 512D by the Universal Sentence Encoder.

PCA via SVD: Center X to X_c, then SVD: X_c = U S V^T. Project to 2D/3D: Y = X_c × V[:, 0:k]. Same mean and V are used to project the analogy vector.

Manual projection: Pick axis indices (0–511); each point is [vec[axis0], vec[axis1]] or [vec[axis0], vec[axis1], vec[axis2]].

Vector analogy: result_vec = embed(A) - embed(B) + embed(C); project with stored PCA or manual axes; nearest word in list (cosine similarity) is shown and highlighted.

Input words (comma separated):
Preset:
PCA: Status: Waiting for model...
Manual: Axis:
Initializing AI models...
If this hangs, open this page via http (e.g. local server), not file://.

Vector Analogy (A - B + C)

Explore semantic relationships (e.g. King - Man + Woman ≈ Queen)

Analogy preset:
- +
Result ≈ --
The arrow from B to A represents a "direction" in meaning; adding it to C often points toward the analogous word. PCA or manual projection puts the 512D result into this 2D/3D view.

512D word vectors (before PCA)

dim

 

Usage Example

Follow these steps to explore word embeddings and PCA:

  1. Load: Open the page; the Universal Sentence Encoder loads (status shows "Model ready").
  2. Input words: Enter comma-separated words in the text area, or choose a Preset. Press Enter to run PCA projection immediately.
  3. Run PCA Projection: Choose PCA 2D or 3D and click "Run PCA Projection". The app embeds all words (512D), runs PCA (SVD) to 2D or 3D, and draws points. Hover for word and coordinates. In 3D, use the rotate button and drag on the canvas to rotate the view.
  4. Manual projection: Choose Manual 2D or 3D, set axis indices (0–511), and click "Run Projection". Changing a spin box value also runs the projection automatically.
  5. Vector Analogy: Set A, B, C (e.g. King, Man, Woman). Click "Calculate Path". The app computes A - B + C in 512D, projects into the current space (PCA or manual), finds the nearest word, and highlights it; the orange point shows where the result lands.
  6. Interpret: Words that are close in meaning tend to cluster. The analogy result (e.g. Queen) is highlighted in gold.

Tip: Run PCA or Manual projection first with a list that includes the analogy words. Then use the Vector Analogy panel so the result is in the same 2D/3D space and the nearest neighbor is among your points.

Parameters

  • Input words: Comma-separated words or short phrases. Each is embedded to 512D. PCA reduces to 2D/3D; Manual uses selected axes (0–511). Press Enter to run PCA.
  • Preset: Dropdown to fill the word list (e.g. All analogy words, Semantic cluster, Royalty, Fruits, Planets).
  • PCA: 2D or 3D dropdown; Run PCA Projection runs SVD and draws the plot. Status shows which original dimensions have max weight in each component.
  • Manual: 2D/3D dropdown, Run Projection button, and axis spin boxes (0–511). Changing a spin box auto-runs the projection.
  • Vector Analogy A, B, C: Three words for A - B + C. Result is computed in 512D, projected into the current space (PCA or manual), and the nearest word is shown and highlighted.

Controls and Visualizations

  • Input words textarea: Comma-separated list; Preset fills it. Enter runs PCA; Run PCA Projection or Manual Run Projection draws the canvas.
  • 2D/3D Canvas: Grid with points (one per word). Hover shows word and coordinates. Canvas controls: Home, Zoom, Pan; in 3D, rotate button and drag to rotate. After "Calculate Path", nearest word is highlighted in gold and the A-B+C point is orange.
  • Vector Analogy and Calculate Path: A, B, C fields; button computes A - B + C in 512D, projects into the current space, finds nearest word, and updates the result and highlight.

Key Concepts

  • Word embeddings: Dense vectors (here 512D from Universal Sentence Encoder) that capture meaning; similar words have similar vectors.
  • PCA (SVD): Centers the embedding matrix, runs SVD, and projects onto the first two or three right singular vectors for 2D/3D while preserving maximum variance.
  • Manual projection: Uses chosen dimensions (0–511) from the 512D embedding directly as coordinates.
  • Vector analogy: A - B + C in embedding space often points toward the "analogous" word (e.g. King - Man + Woman ≈ Queen). The direction from B to A can be interpreted as a relation; adding it to C yields the analogous concept.

NOTE: The same projection (PCA or manual axes) is used for the main word list and for the analogy result so that A-B+C appears in the correct position relative to the other points in 2D/3D.