EE Seminar: GDiffuSE: Diffusion-Based Speech Enhancement With Noise Model Guidance
הרישום לסמינר יבוצע באמצעות סריקת הברקוד למודל (יש להיכנס לפני כן למודל, לא באמצעות האפליקציה)- הרישום מסתיים ב- 15:10
Registration to the seminar is done by scanning the barcode for the Moodle (Please enter ahead to the Moodle, NOT by application)- Registration ends at 15:10
Electrical Engineering Systems Seminar
Speaker: Efrayim Yanir
M.Sc. student under the supervision of Prof. David Burshtein and Prof. Sharon Gannot
Sunday, 9th November 2025, at 15:00
Room 011, Kitot Building, Faculty of Engineering
GDiffuSE: Diffusion-Based Speech Enhancement With Noise Model Guidance
Abstract
This work introduces Guided Diffusion for Speech Enhancement (GDiffuSE), a diffusion based approach to speech enhancement that combines a fixed, pretrained time-domain denoising diffusion probabilistic model (DDPM) with a lightweight helper model that learns noise statistics and guides sampling. Unlike enhancement systems that retrain a large diffusion backbone per noise type, GDiffuSE adapts by fitting a small causal convolutional neural network to short noise-only segments and injecting its score as guidance during reverse diffusion. An signal-to-noise ratio (SNR)-aware schedule applies weak guidance in noisy states and stronger guidance as denoising progresses, improving stability without modifying sampler variance.
We instantiate two variants. CGDiffuSE learns a conditional likelihood of the mixture
given the current diffusion state and uses its log-likelihood gradients as guidance. NGDiffuSE models the effective residual noise at each diffusion step and guides the sampler with its score. Both helpers predict per-sample Gaussian parameters and are trained by maximum likelihood, requiring orders of magnitude fewer parameters and iterations than retraining the backbone.
Evaluations on LibriSpeech mixed with BBC sound effects show consistent gains over
strong generative baselines (e.g., SGMSE) under mismatched noise, particularly in PESQ and SI-SDR, with competitive STOI and DNSMOS. Benefits include rapid adaptation and low compute on edge devices. Limitations include the need for representative noise-only segments and sensitivity to highly non-stationary conditions.

