EE Seminar :Noise Agnostic Outlier Detection on Galaxy Spectra
סמינר זה יחשב כסמינר שמיעה לתלמידי תואר שני ושלישי
Electrical Engineering Systems Seminar
Speaker: Almog Hershko
M.Sc. student under the supervision of Prof. Dovi Poznanski
Wednesday, 31st July 2024, at 15:30
Room 011, Kitot Building, Faculty of Engineering
Noise Agnostic Outlier Detection on Galaxy Spectra
Abstract
The field of astronomy, like many other scientific fields, is deep inside the age of big data. Thanks to technological advancements in sensors and computers, numerous large astronomical datasets already exist containing billions of observations. One such dataset is the Sloan Digital Sky Survey (SDSS), an ongoing sky survey of more than 20 years that includes (among other data) several million galaxy spectra.
Naturally, data-driven algorithms play a key role in helping scientists extract new insights from these datasets. Modern learning algorithms can handle vast amounts of data, extract trends and direct scientists to uncover the underlying physics driving them.
Alternatively, a learning algorithm can detect outliers that stand out from the rest of the data due to some unique or rare physical phenomena. Outlier detectors are unsupervised learning algorithms, that require only data and no labels and produce a ranking of the data according to some learned outlier score. Such outlier detectors can point the attention of researchers to unique objects that potentially hold key to new insights.
A unique feature of astronomical datasets is the fact that the vast majority of the data is noisy. First, because the sources are intrinsically faint, and we cannot change that. Second, because most of the volume of the Universe is far from earth, and objects grow fainter the further away from us they are. Consequently, a survey down to some sensitivity limit will usually detect most of its sources near the largest distance it can reach. This means that machine learning tools for astronomical datasets should be more robust to noise than in other domains.
This thesis builds upon an existing outlier detector for galaxy spectra that is based on unsupervised random forest (URF). URF trained on SDSS spectra has been shown previously to produce meaningful outliers but is prone to false alarm due to low signalto-noise ratio (SNR). The proposed algorithm in this work tries to preserve the good performance for high SNR data while training it to be noise agnostic in addition, by combining RF distillation with denoising in the training process.
השתתפות בסמינר תיתן קרדיט שמיעה = עפ"י רישום שם מלא + מספר ת.ז. בדף הנוכחות שיועבר באולם במהלך הסמינר