EE Seminar: Efficient Training, Fast Inference: Reducing Memory Requirements and Inference Time of Foundation Models

18 במאי 2026, 13:00 
אולם 011, בניין כיתות-חשמל 
EE Seminar: Efficient Training, Fast Inference: Reducing Memory Requirements and Inference Time of Foundation Models

 

הרישום לסמינר יבוצע בתחילת הסמינר באמצעות סריקת הברקוד למודל (יש להיכנס לפני כן למודל,  לא באמצעות האפליקציה)

Registration to the seminar is done at the beginning of the seminar by scanning the barcode for the Moodle (Please enter ahead to the Moodle, NOT by application)

 

(The talk will be given in English)

 

Speaker:     Dr. Ofir Lindenbaum

Faculty of Engineering at Bar-Ilan University

 

011 hall, Electrical Engineering-Kitot Building‏

Monday, May 18th, 2026

13:00 - 14:00

 

Efficient Training, Fast Inference: Reducing Memory Requirements and Inference Time of Foundation Models

 

Abstract

Foundation models present a paradigm shift in machine learning, where it often appears that “scale is all you need” to achieve strong performance across tasks. However, the exponential growth of these models demands substantial memory and computational resources, limiting their usability in small research environments. In this talk, I will discuss our ongoing efforts to make the training and adaptation of large foundation models more efficient and accessible.

I will first introduce AdaRankGrad, which exploits the low-rank structure of gradients to enable memory-efficient full-parameter fine-tuning. I will then present SUMO, a subspace-aware optimizer that performs exact moment orthogonalization to accelerate convergence and enhance generalization while further reducing memory requirements. Finally, I will discuss a complementary approach that compresses models during fine-tuning through stochastic gating, yielding compact networks that maintain accuracy while reducing inference time.

The talk is based on:

[1] Yehonathan Refael, Jonathan Svirsky, Boris Shustin, Wasim Huleihel, and Ofir Lindenbaum. "AdaRankGrad: Adaptive Gradient Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning." ICLR, 2025.

[2] Yehonathan Refael, Guy Smorodinsky, Tom Tirer, and Ofir Lindenbaum. "SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training." NeurIPS, 2025.

[3] Jonathan Svirsky, Refael, Yehonathan, and Ofir Lindenbaum. “Train Less, Infer Faster: Efficient Model Finetuning and Compression via Structured Sparsity.” AISTATS, 2026.

Short Bio

Ofir Lindenbaum is a senior lecturer (assistant professor) in the Faculty of Engineering at Bar-Ilan University. He completed a postdoctoral fellowship at Yale University in the Applied Mathematics Program, working with Prof. Ronald Coifman and Prof. Yuval Kluger, and earned his Ph.D. in Electrical Engineering at Tel Aviv University under the supervision of Prof. Arie Yeredor and Prof. Amir Averbuch.

His research focuses on developing machine learning methods to advance scientific discovery. He works on interpretable and efficient models for high-dimensional tabular data, multimodal learning, and addresses questions in sparsification, optimization, and representation learning. His work aims to build principled, reliable, and data-efficient algorithms that can extract meaningful structure from real-world scientific measurements.

 

  -סמינר זה ייחשב כסמינר שמיעה לתלמידי תואר שני ושלישי-

This Seminar Is Considered A Hearing Seminar For Msc/Phd Students-

.

 

אוניברסיטת תל אביב עושה כל מאמץ לכבד זכויות יוצרים. אם בבעלותך זכויות יוצרים בתכנים שנמצאים פה ו/או השימוש שנעשה בתכנים אלה לדעתך מפר זכויות
שנעשה בתכנים אלה לדעתך מפר זכויות נא לפנות בהקדם לכתובת שכאן >>