|
|
||||||||||||||||
|
This simulation visualizes CELP (Code-Excited Linear Prediction) speech coding: how a source (adaptive + fixed codebook) and a filter (LPC vocal tract model) combine in an Analysis-by-Synthesis loop to represent speech with few parameters (LPC coefficients + pitch delay + codebook index + gains per frame). Math behind the simulation1. Linear Predictive Coding (LPC) Speech is modeled as an all-pole filter driven by an excitation: s(n) = e(n) + Σi=1..P ai · s(n−i) where e(n) is the excitation, ai are the LPC coefficients, and P is the prediction order. The coefficients are found from the signal using the autocorrelation method and Levinson–Durbin recursion. 2. Levinson–Durbin From a frame of samples, compute autocorrelation r[k], then solve for ai recursively. The recursion gives reflection coefficients and updates the predictor and error energy at each order. Lower order (e.g. P=2) yields a broad spectral envelope; higher order (e.g. P=12) tracks formants (vocal resonances) more closely. Example: Step-by-step Levinson–Durbin (Order 2) To see how the recursion estimates coefficients, take a small Order P = 2. We have a short signal s(n) and want a1, a2 so that the current sample is predicted by s(n) ≈ a1·s(n−1) + a2·s(n−2). Step 1 — Autocorrelation: Compute R(k) for the signal. Example values:
Step 2 — Iteration 1 (Order 1): First reflection coefficient k1:
At this stage the best 1st-order prediction is s(n) ≈ 0.8·s(n−1). Step 3 — Iteration 2 (Order 2): Second reflection coefficient k2:
Step 4 — Final result: The 2nd-order model coefficients are a1 ≈ 1.56 and a2 ≈ −0.944. Prediction equation: s(n) ≈ 1.56·s(n−1) − 0.944·s(n−2) Step 5 — Interpretation:
The simulation’s Autocorrelation R(k) bar chart shows the “raw” R(0)…R(P); the LPC Coefficients bar chart shows the resulting ai from this recursion. 3. Spectral envelope The all-pole frequency response is H(z) = G / (1 − Σi ai z−i). Evaluated on the unit circle z = ejω, this gives the LPC spectral envelope (smooth curve in dB). The simulation plots the raw FFT (gray bars) and this envelope (green line) so you see how the envelope “hugs” the spectrum. 4. Adaptive and fixed codebook The excitation is the sum of an adaptive part (pitch-periodic, from past excitation) and a fixed part (noise/innovation). Pitch Delay (L) sets the period; optimal gains for both parts are found by projection. Uncheck Apply Noise to hear the deterministic pitch-only result. 5. Analysis-by-Synthesis and optimal gain For each fixed-codebook candidate, the simulation computes the optimal gain (target · filtered / energy) and picks the candidate that minimizes MSE. Only the winning index, gains, and LPC parameters are sent, giving high compression (e.g. ~12 kbps vs. 64 kbps PCM). 6. Vocal tract shape The vocal tract is modeled as a series of P lossless acoustic tubes. From the LPC coefficients we recover the reflection coefficients ki using the step-down (inverse Levinson) recursion: for i = P−1 down to 0, set ki = ai(i), then update the lower-order predictor by aj(i−1) = (aj(i) + ki · ai−1−j(i)) / (1 − ki2), j = 0,…, i−1. The area function (cross-sectional area of each tube) is then A0 = 1 (glottis, normalized), Ai+1 = Ai · (1 − ki) / (1 + ki), i = 0,…, P−1. The panel draws the tract width proportional to √Ai (radius from area). The shape updates as you move the Start Point across the recording. Usage
Visualizations
Codebook selection (Analysis-by-Synthesis)In CELP, choosing the best codebook entry is a trial-and-error process called Analysis-by-Synthesis. The encoder does not just match waveforms; it tests each candidate by passing it through a model of the vocal tract and comparing the result to the original. 1. Preparation of the target signal Before searching the codebook, the encoder works from the original speech and (in full systems) removes the effect of the vocal tract (LPC) to get a residual signal. The aim is to find a codebook entry that best matches this leftover. When an adaptive codebook is used, the pitch component is removed first; the fixed codebook then searches for the best innovation (noise-like) match. 2. The loop: filter every candidate The encoder steps through codebook indices (e.g. 0 to 255). For each candidate:
3. Calculate optimal gain (G) For each candidate, the encoder computes a scaling factor (gain). Codebook entries are normalized, so they must be scaled to match the level of the original. The optimal gain is found by projecting the target speech onto the filtered candidate (e.g. G = target · filtered / (filtered · filtered)). 4. Compute Mean Squared Error (MSE) The encoder compares original speech to candidate synthetic speech: it computes the sample-by-sample difference (error), then sums the squared errors and normalizes to get the MSE for that index. 5. Select the winner After testing all candidates (or a subset in algebraic CELP), the encoder picks the index with the lowest MSE. The winner index and the gain are the only codebook-side parameters sent over the 4G/5G link. Summary of selection
Analysis-by-Synthesis ensures the chosen excitation is the one that, after the vocal tract filter, sounds closest to the original—as in 5G VoNR and LTE voice codecs. Controls
|
||||||||||||||||