Web Simulator | ShareTechnote

Web Simulation

CELP — Code-Excited Linear Prediction (Speech Coding)

This simulation visualizes CELP (Code-Excited Linear Prediction) speech coding: how a source (adaptive + fixed codebook) and a filter (LPC vocal tract model) combine in an Analysis-by-Synthesis loop to represent speech with few parameters (LPC coefficients + pitch delay + codebook index + gains per frame).

Sections

Math behind the simulation
Simulation
Usage
Visualizations
Codebook selection (Analysis-by-Synthesis)
Controls
Limitations

Math behind the simulation

1. Linear Predictive Coding (LPC)

Speech is modeled as an all-pole filter driven by an excitation:

s(n) = e(n) + Σ_i=1..P a_i · s(n−i)

where e(n) is the excitation, a_i are the LPC coefficients, and P is the prediction order. The coefficients are found from the signal using the autocorrelation method and Levinson–Durbin recursion.

2. Levinson–Durbin

From a frame of samples, compute autocorrelation r[k], then solve for a_i recursively. The recursion gives reflection coefficients and updates the predictor and error energy at each order. Lower order (e.g. P=2) yields a broad spectral envelope; higher order (e.g. P=12) tracks formants (vocal resonances) more closely.

Example: Step-by-step Levinson–Durbin (Order 2)

To see how the recursion estimates coefficients, take a small Order P = 2. We have a short signal s(n) and want a₁, a₂ so that the current sample is predicted by s(n) ≈ a₁·s(n−1) + a₂·s(n−2).

Step 1 — Autocorrelation: Compute R(k) for the signal. Example values:

R(0) = 1.0 (total energy of the signal)
R(1) = 0.8 (high correlation with the immediate past sample)
R(2) = 0.3 (lower correlation with the sample two steps back)

Step 2 — Iteration 1 (Order 1): First reflection coefficient k₁:

k₁ = R(1) / E₀ = 0.8 / 1.0 = 0.8
Coefficient update: a₁⁽¹⁾ = k₁ = 0.8
Error update: E₁ = E₀(1 − k₁²) = 1.0 × (1 − 0.64) = 0.36

At this stage the best 1st-order prediction is s(n) ≈ 0.8·s(n−1).

Step 3 — Iteration 2 (Order 2): Second reflection coefficient k₂:

Numerator: R(2) − a₁⁽¹⁾R(1) = 0.3 − 0.8×0.8 = −0.34
k₂ = −0.34 / E₁ = −0.34 / 0.36 ≈ −0.944
a₁⁽²⁾ = a₁⁽¹⁾ − k₂·a₁⁽¹⁾ = 0.8 − (−0.944)×0.8 ≈ 1.56; a₂⁽²⁾ = k₂ ≈ −0.944
E₂ = E₁(1 − k₂²) (error reduced further)

Step 4 — Final result: The 2nd-order model coefficients are a₁ ≈ 1.56 and a₂ ≈ −0.944. Prediction equation:

s(n) ≈ 1.56·s(n−1) − 0.944·s(n−2)

Step 5 — Interpretation:

a₁ > 0: Strong positive dependence on the previous sample (momentum).
a₂ < 0: Acts as a damping/correction so the filter stays stable (stable vocal tract).
Error reduction: Error goes from E₀ (no prediction) → E₁ (Order 1) → E₂ (Order 2). In 5G codecs we use Order 16 to drive this error as low as possible.

The simulation’s Autocorrelation R(k) bar chart shows the “raw” R(0)…R(P); the LPC Coefficients bar chart shows the resulting a_i from this recursion.

3. Spectral envelope

The all-pole frequency response, evaluated on the unit circle z = e^jω, gives the LPC spectral envelope (smooth curve in dB):

H(z) = G / (1 − Σ_i a_i z⁻ⁱ)

The simulation plots the raw FFT (gray bars) and this envelope (green line) so you see how the envelope “hugs” the spectrum.

4. Adaptive and fixed codebook

The excitation is the sum of an adaptive part (pitch-periodic, from past excitation) and a fixed part (noise/innovation). Pitch Delay (L) sets the period; optimal gains for both parts are found by projection. Uncheck Apply Noise to hear the deterministic pitch-only result.

5. Analysis-by-Synthesis and optimal gain

For each fixed-codebook candidate, the simulation computes the optimal gain (target · filtered / energy) and picks the candidate that minimizes MSE. Only the winning index, gains, and LPC parameters are sent, giving high compression (e.g. ~12 kbps vs. 64 kbps PCM).

6. Vocal tract shape

The vocal tract is modeled as a series of P lossless acoustic tubes. From the LPC coefficients we recover the reflection coefficients k_i using the step-down (inverse Levinson) recursion: for i = P−1 down to 0, set k_i = a_i⁽ⁱ⁾, then update the lower-order predictor by

a_j⁽ⁱ⁻¹⁾ = (a_j⁽ⁱ⁾ + k_i · a_i−1−j⁽ⁱ⁾) / (1 − k_i²), j = 0,…, i−1

The area function (cross-sectional area of each tube) is then

A₀ = 1 (glottis, normalized) A_i+1 = A_i · (1 − k_i) / (1 + k_i), i = 0,…, P−1

The panel draws the tract width proportional to √A_i (radius from area). The shape updates as you move the Start Point across the recording.

Simulation

The interactive simulator is below. Use the controls to explore the concepts described above.

Usage

Load: The simulation loads a default 20 ms frame from the WAV. All plots show the entire 20 ms frame.
Start Point (top of sidebar): Move the slider (or drag the green frame on the Original waveform) to select a different 20 ms segment. LPC, codebook search, and all plots update.
Order P: Change the LPC order (2–16). Lower order gives a broad spectral envelope; higher order tracks formants better. Watch the green envelope and the LPC bar chart.
Pitch Delay (L): Sets the period of the adaptive (pitch) excitation (20–140 samples). When it matches the voice pitch, the synthetic waveform tracks the original more closely.
Apply Noise: When checked, synthesis uses adaptive + fixed codebook (natural but noisy). When unchecked, only the pitch component is used (smooth, buzzy).
Animation Delay: Slider (ms) controls the delay between steps during the codebook search animation. Lower = faster.
Codebook Index: Slider to pick a codebook entry directly (0 to codebook size − 1). Changing it stops any running search and shows that candidate (Winner, MSE, synthesis, scope, residual).
Play / Stop / Step Bwd / Step Fwd: Run or pause the Analysis-by-Synthesis search animation; step backward or forward one codebook index. Search Status shows Idle, Searching…, Paused, or Locked.
Excitation: Switch between White Noise and Glottal Pulse for the fixed codebook. Same LPC filter, different quality.
Play Original / Play Synthetic: Play the current 20 ms frame via the Web Audio API. Search Status, MSE, and Bitrate are shown in the sidebar.

Visualizations

Original waveform: Full source with a draggable green frame marking the 20 ms segment used for analysis.
Source (Codebook): Eight excitation candidates centered on the winner; the winner is green with a check. Index numbers appear under each strip.
MSE per codebook index: Bar chart below the codebook showing MSE for every codebook entry. Winner bar is green; during search animation the current index bar is yellow.
Synthesis: Combined excitation (adaptive + optional noise) through the LPC filter (yellow).
Scope: Original (blue) vs synthetic (yellow, dashed) overlay.
Residual: Error signal (original − synthetic) in red. Smaller when Apply Noise is on.
Spectrum: Gray bars = FFT of the frame; green line = LPC spectral envelope (H(e^jω) in dB).
LPC Coefficients: Bar chart of the predictor coefficients a[0]..a[P−1].
Vocal Tract Cross-Section: Area function derived from LPC (reflection coefficients); throat shape from glottis to lips.

Codebook selection (Analysis-by-Synthesis)

In CELP, choosing the best codebook entry is a trial-and-error process called Analysis-by-Synthesis. The encoder does not just match waveforms; it tests each candidate by passing it through a model of the vocal tract and comparing the result to the original.

1. Preparation of the target signal

Before searching the codebook, the encoder works from the original speech and (in full systems) removes the effect of the vocal tract (LPC) to get a residual signal. The aim is to find a codebook entry that best matches this leftover. When an adaptive codebook is used, the pitch component is removed first; the fixed codebook then searches for the best innovation (noise-like) match.

2. The loop: filter every candidate

The encoder steps through codebook indices (e.g. 0 to 255). For each candidate:

Filtering: The codebook vector is passed through the current LPC synthesis filter.
Synthesis: This produces a “candidate synthetic speech” segment.
In many codecs, a perceptual weighting filter is applied to shape the error so it is less audible.

3. Calculate optimal gain (G)

For each candidate, the encoder computes a scaling factor (gain). Codebook entries are normalized, so they must be scaled to match the level of the original. The optimal gain is found by projecting the target speech onto the filtered candidate:

G = (target · filtered) / (filtered · filtered)

4. Compute Mean Squared Error (MSE)

The encoder compares original speech to candidate synthetic speech: it computes the sample-by-sample difference (error), then sums the squared errors and normalizes to get the MSE for that index.

5. Select the winner

After testing all candidates (or a subset in algebraic CELP), the encoder picks the index with the lowest MSE. The winner index and the gain are the only codebook-side parameters sent over the 4G/5G link.

Summary of selection

Step	Action	Outcome
Search	Loop through index i.	Consider 256 (or N) candidate “sounds.”
Synthesis	Pass candidate through LPC filter.	Obtain N “synthetic” voices.
Comparison	Subtract synthetic from original.	Error score (e.g. MSE) per index.
Selection	Find minimum error.	Single Winner Index (and gain) sent.

Analysis-by-Synthesis ensures the chosen excitation is the one that, after the vocal tract filter, sounds closest to the original—as in 5G VoNR and LTE voice codecs.

Controls

Start Point (slider, top): Starting sample of the 20 ms frame. Synced with dragging the green frame on the Original plot.
Order P (slider): LPC prediction order. Value shown next to slider.
Codebook (dropdown): Codebook size (16, 64, 256, 512) for bitrate and search.
Excitation (dropdown): White Noise or Glottal Pulse for the fixed codebook.
Pitch Delay (L) (slider): Period of the adaptive excitation (20–140).
Apply Noise (checkbox): Include fixed codebook in synthesis when checked.
Animation Delay (slider, ms): Delay between steps in the search animation.
Codebook Index (slider): Select a codebook entry directly; stops the search and shows that candidate.
Play / Stop / Step Bwd / Step Fwd: Run, pause, or step the Analysis-by-Synthesis search.
Play Original / Play Synthetic: Playback buttons. Search Status, MSE, and Bitrate are shown below.

Limitations

Teaching codec, not a standard: the search, codebooks, and bitrate figures illustrate the CELP principle; they are not bit-exact to AMR, EVS, or any deployed 4G/5G voice codec.
Single frame, no inter-frame coding: LPC and codebook parameters are solved per 20 ms frame; real codecs interpolate parameters and quantize them (LSP/LSF) for transmission — quantization is not modelled here.
Simplified perceptual weighting: the MSE objective is plain squared error; production codecs apply a perceptual weighting filter so the audible error, not the raw error, is minimized.
Open-loop pitch: the adaptive-codebook pitch delay is set by hand rather than via the joint closed-loop pitch+innovation search used in real encoders.