סמינר של הפקולטה להנדסה ע"ש איבי ואלדר פליישמן

EE Seminar: GDiffuSE: Diffusion-Based Speech Enhancement With Noise Model Guidance

09 בנובמבר 2025, 15:00

אולם 011, בניין כיתות-חשמל

EE Seminar: GDiffuSE: Diffusion-Based Speech Enhancement With Noise Model Guidance

הרישום לסמינר יבוצע באמצעות סריקת הברקוד למודל (יש להיכנס לפני כן למודל, לא באמצעות האפליקציה)- הרישום מסתיים ב- 15:10

Registration to the seminar is done by scanning the barcode for the Moodle (Please enter ahead to the Moodle, NOT by application)- Registration ends at 15:10

Electrical Engineering Systems Seminar

Speaker: Efrayim Yanir

M.Sc. student under the supervision of Prof. David Burshtein and Prof. Sharon Gannot

Sunday, 9^th November 2025, at 15:00

Room 011, Kitot Building, Faculty of Engineering

GDiffuSE: Diffusion-Based Speech Enhancement With Noise Model Guidance

Abstract

This work introduces Guided Diffusion for Speech Enhancement (GDiffuSE), a diffusion based approach to speech enhancement that combines a fixed, pretrained time-domain denoising diffusion probabilistic model (DDPM) with a lightweight helper model that learns noise statistics and guides sampling. Unlike enhancement systems that retrain a large diffusion backbone per noise type, GDiffuSE adapts by fitting a small causal convolutional neural network to short noise-only segments and injecting its score as guidance during reverse diffusion. An signal-to-noise ratio (SNR)-aware schedule applies weak guidance in noisy states and stronger guidance as denoising progresses, improving stability without modifying sampler variance.

We instantiate two variants. CGDiffuSE learns a conditional likelihood of the mixture

given the current diffusion state and uses its log-likelihood gradients as guidance. NGDiffuSE models the effective residual noise at each diffusion step and guides the sampler with its score. Both helpers predict per-sample Gaussian parameters and are trained by maximum likelihood, requiring orders of magnitude fewer parameters and iterations than retraining the backbone.

Evaluations on LibriSpeech mixed with BBC sound effects show consistent gains over

strong generative baselines (e.g., SGMSE) under mismatched noise, particularly in PESQ and SI-SDR, with competitive STOI and DNSMOS. Benefits include rapid adaptation and low compute on edge devices. Limitations include the need for representative noise-only segments and sensitivity to highly non-stationary conditions.

קישורים שימושיים- בית הספר להנדסת חשמל

לוח שנת הלימודים

מתעניינים בלימודים

מידע לסטודנטים ולסטודנטיות

מערכת הפניות של הפקולטה להנדסה

בעלי תפקידים - בית הספר להנדסת חשמל

מלגות קיום תואר שני

מלגות קיום תואר שלישי

השתלמות בתר דוקטורטית

משרות תרגול והדרכה