Web Simulation 

 

 

 

 

CELP — Code-Excited Linear Prediction (Speech Coding)

This simulation visualizes CELP (Code-Excited Linear Prediction) speech coding: how a source (adaptive + fixed codebook) and a filter (LPC vocal tract model) combine in an Analysis-by-Synthesis loop to represent speech with few parameters (LPC coefficients + pitch delay + codebook index + gains per frame).

Math behind the simulation

1. Linear Predictive Coding (LPC)

Speech is modeled as an all-pole filter driven by an excitation:

s(n) = e(n) + Σi=1..P ai · s(n−i)

where e(n) is the excitation, ai are the LPC coefficients, and P is the prediction order. The coefficients are found from the signal using the autocorrelation method and Levinson–Durbin recursion.

2. Levinson–Durbin

From a frame of samples, compute autocorrelation r[k], then solve for ai recursively. The recursion gives reflection coefficients and updates the predictor and error energy at each order. Lower order (e.g. P=2) yields a broad spectral envelope; higher order (e.g. P=12) tracks formants (vocal resonances) more closely.

Example: Step-by-step Levinson–Durbin (Order 2)

To see how the recursion estimates coefficients, take a small Order P = 2. We have a short signal s(n) and want a1, a2 so that the current sample is predicted by s(n) ≈ a1·s(n−1) + a2·s(n−2).

Step 1 — Autocorrelation: Compute R(k) for the signal. Example values:

  • R(0) = 1.0 (total energy of the signal)
  • R(1) = 0.8 (high correlation with the immediate past sample)
  • R(2) = 0.3 (lower correlation with the sample two steps back)

Step 2 — Iteration 1 (Order 1): First reflection coefficient k1:

  • k1 = R(1) / E0 = 0.8 / 1.0 = 0.8
  • Coefficient update: a1(1) = k1 = 0.8
  • Error update: E1 = E0(1 − k12) = 1.0 × (1 − 0.64) = 0.36

At this stage the best 1st-order prediction is s(n) ≈ 0.8·s(n−1).

Step 3 — Iteration 2 (Order 2): Second reflection coefficient k2:

  • Numerator: R(2)a1(1)R(1) = 0.3 − 0.8×0.8 = −0.34
  • k2 = −0.34 / E1 = −0.34 / 0.36 ≈ −0.944
  • a1(2) = a1(1)k2·a1(1) = 0.8 − (−0.944)×0.8 ≈ 1.56; a2(2) = k2 ≈ −0.944
  • E2 = E1(1 − k22) (error reduced further)

Step 4 — Final result: The 2nd-order model coefficients are a1 ≈ 1.56 and a2 ≈ −0.944. Prediction equation:

s(n) ≈ 1.56·s(n−1) − 0.944·s(n−2)

Step 5 — Interpretation:

  • a1 > 0: Strong positive dependence on the previous sample (momentum).
  • a2 < 0: Acts as a damping/correction so the filter stays stable (stable vocal tract).
  • Error reduction: Error goes from E0 (no prediction) → E1 (Order 1) → E2 (Order 2). In 5G codecs we use Order 16 to drive this error as low as possible.

The simulation’s Autocorrelation R(k) bar chart shows the “raw” R(0)…R(P); the LPC Coefficients bar chart shows the resulting ai from this recursion.

3. Spectral envelope

The all-pole frequency response is H(z) = G / (1 − Σi ai z−i). Evaluated on the unit circle z = e, this gives the LPC spectral envelope (smooth curve in dB). The simulation plots the raw FFT (gray bars) and this envelope (green line) so you see how the envelope “hugs” the spectrum.

4. Adaptive and fixed codebook

The excitation is the sum of an adaptive part (pitch-periodic, from past excitation) and a fixed part (noise/innovation). Pitch Delay (L) sets the period; optimal gains for both parts are found by projection. Uncheck Apply Noise to hear the deterministic pitch-only result.

5. Analysis-by-Synthesis and optimal gain

For each fixed-codebook candidate, the simulation computes the optimal gain (target · filtered / energy) and picks the candidate that minimizes MSE. Only the winning index, gains, and LPC parameters are sent, giving high compression (e.g. ~12 kbps vs. 64 kbps PCM).

6. Vocal tract shape

The vocal tract is modeled as a series of P lossless acoustic tubes. From the LPC coefficients we recover the reflection coefficients ki using the step-down (inverse Levinson) recursion: for i = P−1 down to 0, set ki = ai(i), then update the lower-order predictor by

aj(i−1) = (aj(i) + ki · ai−1−j(i)) / (1 − ki2),   j = 0,…, i−1.

The area function (cross-sectional area of each tube) is then

A0 = 1   (glottis, normalized),    Ai+1 = Ai · (1 − ki) / (1 + ki),   i = 0,…, P−1.

The panel draws the tract width proportional to √Ai (radius from area). The shape updates as you move the Start Point across the recording.

Usage

  1. Load: The simulation loads a default 20 ms frame from the WAV. All plots show the entire 20 ms frame.
  2. Start Point (top of sidebar): Move the slider (or drag the green frame on the Original waveform) to select a different 20 ms segment. LPC, codebook search, and all plots update.
  3. Order P: Change the LPC order (2–16). Lower order gives a broad spectral envelope; higher order tracks formants better. Watch the green envelope and the LPC bar chart.
  4. Pitch Delay (L): Sets the period of the adaptive (pitch) excitation (20–140 samples). When it matches the voice pitch, the synthetic waveform tracks the original more closely.
  5. Apply Noise: When checked, synthesis uses adaptive + fixed codebook (natural but noisy). When unchecked, only the pitch component is used (smooth, buzzy).
  6. Animation Delay: Slider (ms) controls the delay between steps during the codebook search animation. Lower = faster.
  7. Codebook Index: Slider to pick a codebook entry directly (0 to codebook size − 1). Changing it stops any running search and shows that candidate (Winner, MSE, synthesis, scope, residual).
  8. Play / Stop / Step Bwd / Step Fwd: Run or pause the Analysis-by-Synthesis search animation; step backward or forward one codebook index. Search Status shows Idle, Searching…, Paused, or Locked.
  9. Excitation: Switch between White Noise and Glottal Pulse for the fixed codebook. Same LPC filter, different quality.
  10. Play Original / Play Synthetic: Play the current 20 ms frame via the Web Audio API. Search Status, MSE, and Bitrate are shown in the sidebar.

Visualizations

  • Original waveform: Full source with a draggable green frame marking the 20 ms segment used for analysis.
  • Source (Codebook): Eight excitation candidates centered on the winner; the winner is green with a check. Index numbers appear under each strip.
  • MSE per codebook index: Bar chart below the codebook showing MSE for every codebook entry. Winner bar is green; during search animation the current index bar is yellow.
  • Synthesis: Combined excitation (adaptive + optional noise) through the LPC filter (yellow).
  • Scope: Original (blue) vs synthetic (yellow, dashed) overlay.
  • Residual: Error signal (original − synthetic) in red. Smaller when Apply Noise is on.
  • Spectrum: Gray bars = FFT of the frame; green line = LPC spectral envelope (H(e) in dB).
  • LPC Coefficients: Bar chart of the predictor coefficients a[0]..a[P−1].
  • Vocal Tract Cross-Section: Area function derived from LPC (reflection coefficients); throat shape from glottis to lips.

Codebook selection (Analysis-by-Synthesis)

In CELP, choosing the best codebook entry is a trial-and-error process called Analysis-by-Synthesis. The encoder does not just match waveforms; it tests each candidate by passing it through a model of the vocal tract and comparing the result to the original.

1. Preparation of the target signal

Before searching the codebook, the encoder works from the original speech and (in full systems) removes the effect of the vocal tract (LPC) to get a residual signal. The aim is to find a codebook entry that best matches this leftover. When an adaptive codebook is used, the pitch component is removed first; the fixed codebook then searches for the best innovation (noise-like) match.

2. The loop: filter every candidate

The encoder steps through codebook indices (e.g. 0 to 255). For each candidate:

  • Filtering: The codebook vector is passed through the current LPC synthesis filter.
  • Synthesis: This produces a “candidate synthetic speech” segment.
  • In many codecs, a perceptual weighting filter is applied to shape the error so it is less audible.

3. Calculate optimal gain (G)

For each candidate, the encoder computes a scaling factor (gain). Codebook entries are normalized, so they must be scaled to match the level of the original. The optimal gain is found by projecting the target speech onto the filtered candidate (e.g. G = target · filtered / (filtered · filtered)).

4. Compute Mean Squared Error (MSE)

The encoder compares original speech to candidate synthetic speech: it computes the sample-by-sample difference (error), then sums the squared errors and normalizes to get the MSE for that index.

5. Select the winner

After testing all candidates (or a subset in algebraic CELP), the encoder picks the index with the lowest MSE. The winner index and the gain are the only codebook-side parameters sent over the 4G/5G link.

Summary of selection

StepActionOutcome
SearchLoop through index i.Consider 256 (or N) candidate “sounds.”
SynthesisPass candidate through LPC filter.Obtain N “synthetic” voices.
ComparisonSubtract synthetic from original.Error score (e.g. MSE) per index.
SelectionFind minimum error.Single Winner Index (and gain) sent.

Analysis-by-Synthesis ensures the chosen excitation is the one that, after the vocal tract filter, sounds closest to the original—as in 5G VoNR and LTE voice codecs.

Controls

  • Start Point (slider, top): Starting sample of the 20 ms frame. Synced with dragging the green frame on the Original plot.
  • Order P (slider): LPC prediction order. Value shown next to slider.
  • Codebook (dropdown): Codebook size (16, 64, 256, 512) for bitrate and search.
  • Excitation (dropdown): White Noise or Glottal Pulse for the fixed codebook.
  • Pitch Delay (L) (slider): Period of the adaptive excitation (20–140).
  • Apply Noise (checkbox): Include fixed codebook in synthesis when checked.
  • Animation Delay (slider, ms): Delay between steps in the search animation.
  • Codebook Index (slider): Select a codebook entry directly; stops the search and shows that candidate.
  • Play / Stop / Step Bwd / Step Fwd: Run, pause, or step the Analysis-by-Synthesis search.
  • Play Original / Play Synthetic: Playback buttons. Search Status, MSE, and Bitrate are shown below.