A review of unsupervised learning in astronomy

Sotiria Fotopoulou
DOI: https://doi.org/10.1016/j.ascom.2024.100851
2024-06-25
Abstract:This review summarizes popular unsupervised learning methods, and gives an overview of their past, current, and future uses in astronomy. Unsupervised learning aims to organise the information content of a dataset, in such a way that knowledge can be extracted. Traditionally this has been achieved through dimensionality reduction techniques that aid the ranking of a dataset, for example through principal component analysis or by using auto-encoders, or simpler visualisation of a high dimensional space, for example through the use of a self organising map. Other desirable properties of unsupervised learning include the identification of clusters, i.e. groups of similar objects, which has traditionally been achieved by the k-means algorithm and more recently through density-based clustering such as HDBSCAN. More recently, complex frameworks have emerged, that chain together dimensionality reduction and clustering methods. However, no dataset is fully unknown. Thus, nowadays a lot of research has been directed towards self-supervised and semi-supervised methods that stand to gain from both supervised and unsupervised learning.
Instrumentation and Methods for Astrophysics,Machine Learning
What problem does this paper attempt to address?
This paper primarily focuses on the application and review of unsupervised learning methods in astronomy. Its core aim is to summarize the various unsupervised learning techniques used in the field of astronomy over the past 30 years and to explore how these techniques help extract knowledge from astronomical data. Specifically, the paper first defines the goal of unsupervised learning, which is to organize information content in a dataset without explicit labels to extract knowledge. Next, it outlines several traditional unsupervised learning methods, such as Principal Component Analysis (PCA), Autoencoders (AE), and techniques for data visualization and dimensionality reduction, such as Self-Organizing Maps (SOM). Additionally, it discusses the importance of clustering algorithms, particularly the k-means algorithm and density-based clustering methods (such as HDBSCAN). As research progresses, the paper points out the emergence of complex frameworks in recent years that combine dimensionality reduction techniques and clustering methods, as well as the increasing study of self-supervised and semi-supervised methods, which can leverage the advantages of both supervised and unsupervised learning. The paper also details the history of machine learning applications in astronomy, from early digital astronomy to the multi-wavelength era, to the period of computational mainstreaming and the machine learning revolution, and finally to the current meta-algorithm stage. Throughout this process, the paper emphasizes how advancements in hardware, software, and data availability have influenced astronomical research. Finally, the paper offers some suggestions for future applications, based on the authors' observations of existing literature and personal experience. In summary, this paper attempts to systematically review and summarize the application of unsupervised learning methods in astronomy and explores how these methods help address the big data challenges in astronomy.