|
Audio Signal Processing Lab - DSP Engineering Tutorial
This note provides an interactive Digital Signal Processing (DSP) laboratory for audio analysis. It demonstrates how DSP algorithms (filters) modify sound in real-time, and how these modifications appear in both the Time Domain (waveform) and Frequency Domain (spectrum).
The simulation allows you to load audio files or use your live microphone, apply real-time filters (Low-Pass, High-Pass, Band-Pass), and observe the effects through three synchronized visualizations: oscilloscope (time domain), FFT spectrum (frequency domain), and spectrogram (frequency vs. time waterfall). This teaches the engineering principles behind audio processing without hiding the math in a black box.
Math behind the Simulation
1. Web Audio API - AnalyserNode
The simulation uses the Web Audio API's AnalyserNode to extract audio data in real-time:
-
AudioContext: Creates the audio processing graph. The audio source is connected to an AnalyserNode, which is then connected to the audio destination (speakers).
-
AnalyserNode: Performs FFT analysis on the audio stream. The FFT size is set to 2048 samples, providing 1024 frequency bins.
-
getByteTimeDomainData(): Returns time-domain data as a Uint8Array (0-255), representing the waveform amplitude at each sample point.
-
getByteFrequencyData(): Returns frequency-domain data as a Uint8Array (0-255), representing the magnitude at each frequency bin.
2. Fast Fourier Transform (FFT)
The AnalyserNode internally performs FFT to convert time-domain signals to frequency-domain:
-
FFT Size: 2048 samples. This determines the frequency resolution: Δf = fs / 2048, where fs is the sample rate (typically 16 kHz for speech).
-
Frequency Bins: The FFT produces 1024 frequency bins (half of FFT size). Each bin represents a frequency range: bin k represents frequency k × (fs / 2048) Hz.
-
Nyquist Frequency: The maximum frequency that can be represented is fNyquist = fs / 2. For 16 kHz sample rate, this is 8 kHz.
-
Magnitude Calculation: For each frequency bin, the magnitude is calculated as |X[k]| = √(Re(X[k])² + Im(X[k])²), where X[k] is the complex FFT output.
3. Time Domain Visualization (Oscilloscope)
The time domain chart displays the waveform amplitude over time:
-
Data Source: Uses getByteTimeDomainData() to get 2048 samples of amplitude data during real-time playback. After playback ends, the entire file is processed and displayed.
-
Normalization: Values are normalized from 0-255 to -1.0 to +1.0: v = (data[i] / 128.0) - 1.0
-
Rendering: Draws a continuous line connecting sample points, creating the classic oscilloscope waveform display.
-
Time Resolution: Each sample represents 1 / fs seconds. For 2048 samples at 16 kHz, this represents ~128 ms of audio.
-
Interactive Zoom and Pan: When the full file is loaded (after playback ends), you can:
- Y-axis Zoom: Scroll the mouse wheel over the time domain plot to zoom in/out vertically (zoom range: 0.1x to 10.0x). This allows you to examine small amplitude variations in detail.
- X-axis Pan: Click and drag left/right over the time domain plot to pan horizontally through the audio file. This lets you navigate through long recordings.
- The current zoom level and pan offset are displayed in the top-left corner of the plot when active.
4. Frequency Domain Visualization (Spectrum)
The frequency domain chart displays the spectral signature (FFT magnitude):
-
Data Source: Uses getByteFrequencyData() to get 1024 frequency bins with magnitude values (0-255) during real-time playback. After playback ends, the full file's spectrum is calculated using Welch's method (overlapping windows with Hanning windowing) for accurate frequency analysis.
-
Bar Height: Each bar's height represents the magnitude at that frequency: height = (data[i] / 255) × canvas_height.
-
Color Coding: Bars are colored by frequency range to highlight different parts of the spectrum:
- Bass (0-300 Hz): Blue - Low-frequency components (fundamental frequencies, vowels)
- Mids (300-3000 Hz): Green - Mid-frequency components (formants, consonants)
- Treble (3000+ Hz): Red/Magenta - High-frequency components (fricatives, sibilants)
-
Educational Value: Different words have distinct spectral patterns. For example, "stop" has strong high-frequency content (red bars) due to the "s" and "t" sounds, while "left" has more mid-frequency energy (green bars).
5. BiquadFilterNode - Digital Filter Implementation
The Web Audio API uses BiquadFilterNode to implement digital filters. The filter is a second-order IIR (Infinite Impulse Response) filter defined by the transfer function:
-
Filter Types:
- Low-Pass: Passes frequencies below the cutoff frequency. The frequency parameter is the cutoff frequency (3 dB point).
- High-Pass: Passes frequencies above the cutoff frequency. The frequency parameter is the cutoff frequency (3 dB point).
- Band-Pass: Passes frequencies within a narrow band. The frequency parameter is the center frequency (not a cutoff). The bandwidth is determined by the Q-factor: bandwidth = center frequency / Q. For example, with center frequency = 1000 Hz and Q = 1.0, the passband is approximately 500-1500 Hz.
-
Q-Factor (Quality Factor): Controls the filter's sharpness and resonance. Higher Q values create steeper roll-off and more pronounced resonance. For bandpass filters, Q directly determines bandwidth: Q = center frequency / bandwidth.
-
Real-Time Processing: Filters are applied in real-time using the audio node graph: Source → Filter → Analyser → Destination. All filter parameter changes take effect immediately without interrupting playback.
-
Full File Processing: After playback ends, the entire audio file is processed through the filter using OfflineAudioContext, allowing you to see the complete filtered result and apply different filter settings to the full recording.
Usage Example
Follow these steps to explore the Voiceprint Visualizer and understand the difference between time and frequency domain representations:
-
Load Audio Data: The simulation automatically loads the manifest.json file on startup, which lists available audio files organized by word (left, right, stop).
-
Select a Word: Use the "Word" dropdown to select a word category (Left, Right, or Stop). The "File" dropdown will automatically populate with available audio files for that word.
-
Select a File: Choose a specific audio file from the "File" dropdown. Each word has multiple recordings, allowing you to compare variations.
-
Play Audio: Click the "Play" button to start playback. The audio will play through your speakers, and all three visualizations will update in real-time (60 frames per second). When playback ends, the entire file is automatically processed and displayed, allowing you to zoom and pan through the complete waveform.
-
Observe Time Domain (Top Chart):
- Watch the oscilloscope-style waveform as the audio plays.
- Notice how the waveform represents amplitude changes over time.
- Different words may look similar in the time domain, making them hard to distinguish.
- After playback ends, the full file is displayed. Use mouse scroll to zoom vertically and drag to pan horizontally through the complete waveform.
-
Observe Frequency Domain (Bottom Chart):
- Watch the spectral signature (FFT bars) update in real-time.
- Notice the color coding: Blue bars (bass), Green bars (mids), Red bars (treble).
- Compare different words: "stop" typically has strong red bars (high frequencies from "s" and "t" sounds), while "left" has more green bars (mid frequencies).
- This is the "voiceprint" - each word has a unique spectral signature that makes it identifiable even when time-domain waveforms look similar.
-
Compare Words: Play different words and observe how their spectral signatures differ. This demonstrates why frequency-domain analysis is crucial for speech recognition and audio processing.
Tips:
-
Filter Demonstration: Start with "Bypass" to hear the original sound, then switch to "Low-Pass" and drag the cutoff slider down. You'll hear muffling while seeing red bars disappear - this visually proves what the filter does! After playback ends, you can adjust filters and see the effect on the entire file.
-
Q-Factor Effect: With Band-Pass filter, try increasing the Q-factor. You'll hear a more "ringing" or resonant sound, and see a narrower band of frequencies in the FFT chart. Note that for Band-Pass, the frequency parameter is the center frequency, not a cutoff - the label changes automatically when you select Band-Pass.
-
Zoom and Pan: After playback ends, try scrolling over the time domain plot to zoom in on small details, or drag to pan through the entire recording. This is especially useful for examining long audio files or finding specific events in the waveform.
-
Spectrogram Patterns: In the waterfall display, notice how "stop" creates bright vertical streaks (high-frequency "s" and "t" sounds), while "left" has more horizontal bands (vowel sounds). This explains why words sound different.
-
Live Microphone: Try the microphone input and speak different words. You'll see your own voiceprint in real-time, and can apply filters to hear how they modify your voice.
-
Real-Time Processing: All filter adjustments happen instantly while audio plays, demonstrating true real-time DSP processing. This is how audio effects processors work.
Visualizations
The simulation provides three synchronized real-time visualizations that update as audio plays, demonstrating different aspects of signal analysis:
-
1. Time Domain - Amplitude (Oscilloscope): Displays the waveform amplitude over time with a retro oscilloscope aesthetic (green glow on dark grid). This shows the "raw" signal - amplitude variations over time. While you can see waveform shape changes when filters are applied, the frequency domain view makes filter effects much clearer. This view answers: "How loud is the signal over time?" After playback ends, the full file is displayed and you can use mouse scroll to zoom vertically (Y-axis) and drag to pan horizontally (X-axis) through the complete waveform. The plot height is 160 pixels to fit all visualizations on screen without scrolling.
-
2. Frequency Domain - Spectral Signature (FFT): Displays the frequency spectrum using vertical bars, where each bar represents the magnitude at a specific frequency. The bars are color-coded by frequency range: Blue (0-300 Hz, bass), Green (300-3000 Hz, mids), Red/Magenta (3000+ Hz, treble). This is where filter effects are most visible: when you apply a Low-Pass filter, the red bars (high frequencies) disappear. This view answers: "What frequencies are present in the signal?" After playback ends, the full file's average spectrum is calculated using Welch's method (overlapping windows) for accurate frequency analysis. The plot height is 160 pixels.
-
3. Spectrogram - Frequency vs Time (Waterfall): A scrolling heatmap showing frequency content over time. The horizontal axis is frequency (low to high), the vertical axis is time (top = recent, bottom = older). Color intensity represents magnitude (blue = low, red = high). This creates a "voiceprint" signature where different sounds create distinct patterns. For example, "s" sounds create bright vertical streaks (high frequencies), while vowels create horizontal bands (low-mid frequencies). This view answers: "How does the frequency content change over time?" After playback ends, the complete spectrogram from the entire file is displayed. The plot height is 160 pixels.
Controls
-
Input Source Dropdown: Select "Audio File" to load from the dataset, or "Live Microphone" to analyze your own voice in real-time. When microphone is selected, the file selection controls are hidden.
-
Word Dropdown: (File mode only) Select a word category (Left, Right, or Stop). The available words are loaded from the manifest.json file. When you select a word, the File dropdown is automatically populated. The first word is selected by default on page load.
-
File Dropdown: (File mode only) Select a specific audio file from the selected word category. Each word has multiple recordings for comparison. The first file is selected by default when a word is chosen.
-
Filter Type Dropdown: Select the DSP filter to apply. The first option (Bypass) is selected by default:
- Bypass: No filtering, original signal passes through unchanged.
- Low-Pass (Muffle): Removes high frequencies, creating a muffled sound. Watch red bars disappear in the FFT chart.
- High-Pass (Tinny): Removes low frequencies, creating a tinny sound. Watch blue bars disappear in the FFT chart.
- Band-Pass (Telephone): Isolates a narrow frequency band, creating a telephone effect. When selected, the frequency parameter label changes to "Center Frequency" (see below).
-
Cutoff/Center Frequency Slider: Adjusts the filter frequency parameter from 20 Hz to 20,000 Hz (full human hearing range). The label dynamically changes based on filter type:
- For Low-Pass and High-Pass: Label shows "Cutoff Frequency". Lower values for Low-Pass create more muffling. Higher values for High-Pass create more tinny sound.
- For Band-Pass: Label shows "Center Frequency". This sets the center frequency of the passband. The bandwidth is determined by the Q-factor (bandwidth = center frequency / Q). For example, with center frequency = 1000 Hz and Q = 1.0, the passband is approximately 500-1500 Hz.
The effect is visible in real-time in the FFT chart. When the full file is loaded (after playback), adjusting filters re-processes the entire file so you can see the complete filtered result.
-
Q-Factor Slider: Controls filter sharpness/resonance from 0.1 to 10.0. Higher values create steeper roll-off and more pronounced resonance. Most noticeable with Band-Pass filter.
-
Play Button: Starts audio playback (file mode) or microphone capture (mic mode) and begins real-time visualization. The button is disabled while active. All three charts update 60 times per second. When file playback ends, the entire file is automatically processed and displayed, allowing you to zoom and pan through the complete waveform and apply filters to the full recording.
-
Stop Button: Stops audio playback/microphone and clears the visualizations. The button is only enabled while active. If a full file was loaded, stopping clears it and resets zoom/pan.
-
Status Display: Shows the current state (Ready, Playing..., Listening..., or Stopped) of the audio processor.
Data Format
The audio data is organized in the audio_data/ directory with the following structure:
audio_data/
├── manifest.json
├── left/
│ ├── file1.wav
│ ├── file2.wav
│ └── ...
├── right/
│ ├── file1.wav
│ └── ...
└── stop/
├── file1.wav
└── ...
The manifest.json file lists available audio files in JSON format:
{
"left": [
"left/file1.wav",
"left/file2.wav"
],
"right": [
"right/file1.wav"
],
"stop": [
"stop/file1.wav"
]
}
Audio files are in WAV format (16-bit PCM, typically 16 kHz sample rate). The Web Audio API automatically decodes the audio files when loaded. To add your own audio files, place them in the appropriate subdirectory and update the manifest.json file.
Note: Due to browser security restrictions (CORS policy), this application must be served from a web server (not opened directly as a file). Use a local development server like VS Code's Live Server extension or Python's http.server module.
|
|