EE Seminar: Multimodal Learning for High-Level Semantic Understanding: From Interpretability to Real-World Applications
https://tau-ac-il.zoom.us/j/86408043561?pwd=klH217NzKKIvt587lUerjOEGkQQuFF.1
Meeting ID: 864 0804 3561
Passcode: 724924
Electrical Engineering Systems ZOOM Seminar
Speaker: Morris Alper
Ph.D. student under the supervision of Dr. Hadar Elor and Prof. Raja Giryes
Sunday, 20th July 2025, at 15:00
Multimodal Learning for High-Level Semantic Understanding: From Interpretability to Real-World Applications
Abstract
This research focuses on building a more comprehensive understanding of multimodal models that combine language, vision, and three-dimensional geometry. While these models enable complex tasks previously infeasible, their inner workings remain opaque, making it unclear how they parse and synthesize different modalities or how to leverage them for high-level tasks beyond their training objectives.
We begin from an interpretability perspective, investigating emergent knowledge in models trained on paired images and text. We find several emergent properties: visual reasoning in text learned from visual supervision; sound-symbolic associations parallel to human cognition; and emergent visual-semantic hierarchical knowledge. These insights reveal how these models operate and suggest new applications.
We then discuss applications harnessing these multimodal reasoning capabilities for complex tasks. We use multimodal models for unconventional document understanding, tackling ancient writing and modern engineering diagrams. We also explore learning 3D understanding from unstructured image collections depicting large-scale scenes, leveraging weak supervision from textual metadata and adapting generative models for appearance variations.
This work provides important insights into multimodal models, encompassing both interpretability and novel applications requiring high-level semantic understanding of weakly-structured data.
השתתפות בסמינר תיתן קרדיט שמיעה = עפ"י רישום בצ'ט של שם מלא + מספר ת.ז.