|
|
|||||||||
|
This interactive simulator visualizes Scaled Dot-Product Attention using real 512-dimensional word embeddings from the Universal Sentence Encoder (USE). Words are shown in a circle; you click a word to set it as the Query (Q). The table then shows Q·K/√dk, Softmax %, and the magnitude of each word’s weighted Value contribution (‖w·V‖). Formula: Attention(Q, K, V) = softmax((Q·KT / √dk) / τ) V. Wq, Wk, Wv are identity-style (first 64 dims preserved) so Q·K reflects USE semantic similarity—e.g. “cat” attends more to “sat” and “mat”. τ = temperature: low (slider left) sharpens peaks; high (slider right) makes weights uniform. dk = 64.
Input words (space or comma separated):
Status: Loading...
Head type:
Temperature (τ):
5.00
Click a word to set it as Query word
Loading Universal Sentence Encoder...
Use http:// (e.g. local server), not file://. Scaled dot-product: softmax(Q·KT / √dk) — Query (Q) in gold, Keys (K) in cyan
Weight matrices W_Q, W_K, W_V (512×64) — Q = X·W_Q, K = X·W_K, V = X·W_V, X = E+PEW_Q
W_K
W_V
Z_head = w · V_??? , 64 dims
How to use
What you’re seeing
Technical noteEmbeddings X come from USE (512-d). Wq and Wk are the same projection onto the first 64 dimensions, so Q and K reflect semantic similarity in USE space. Wv is a fixed random matrix. This simulator does not use trained transformer weights; it illustrates the attention equation with real embeddings and synthetic Wq, Wk, Wv.
|
|||||||||