Participatory Science and Machine Learning Applied to Millions of Sources in the Hobby-Eberly Telescope Dark Energy Experiment

Lindsay R. House,Karl Gebhardt,Keely Finkelstein,Erin Mentuch Cooper,Dustin Davis,Daniel J. Farrow,Donald P. Schneider
DOI: https://doi.org/10.3847/1538-4357/ad782c
2024-09-13
Abstract:We are merging a large participatory science effort with machine learning to enhance the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX). Our overall goal is to remove false positives, allowing us to use lower signal-to-noise data and sources with low goodness-of-fit. With six million classifications through Dark Energy Explorers, we can confidently determine if a source is not real at over 94% confidence level when classified by at least ten individuals; this confidence level increases for higher signal-to-noise sources. To date, we have only been able to apply this direct analysis to 190,000 sources. The full sample of HETDEX will contain around 2-3M sources, including nearby galaxies ([O II] emitters), distant galaxies (Lyman-alpha emitters or LAEs), false positives, and contamination from instrument issues. We can accommodate this tenfold increase by using machine learning with visually-vetted samples from Dark Energy Explorers. We have already increased by over ten-fold in number of sources that have been visually vetted from our previous pilot study where we only had 14,000 visually vetted LAE candidates. This paper expands on the previous work increasing the visually-vetted sample from 14,000 to 190,000. In addition, using our currently visually-vetted sample, we generate a real or false positive classification for the full candidate sample of 1.2 million LAEs. We currently have approximately 17,000 volunteers from 159 countries around the world. Thus, we are applying participatory or citizen scientist analysis to our full HETDEX dataset, creating a free educational opportunity that requires no prior technical knowledge.
Instrumentation and Methods for Astrophysics,Cosmology and Nongalactic Astrophysics,Astrophysics of Galaxies,Physics Education
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to improve the data processing efficiency and accuracy of the Hobby - Eberly Telescope Dark Energy Experiment (HETDEX) by combining large - scale participatory science and machine learning. Specifically, the paper attempts to solve the following key problems: 1. **Reduce False Positives**: - HETDEX generates a large amount of spectral data, including many false positives and sources contaminated by the instrument. These false positives will affect the final precision of cosmological parameter measurements. The paper proposes to identify and remove these false positives through the public science project Dark Energy Explorers (DEE) and machine - learning techniques. 2. **Handle Low Signal - to - Noise Ratio (SNR) Data**: - The paper points out that the existing analysis methods are not effective in handling low - SNR data, resulting in some real signals being missed. By introducing public science and machine learning, these low - SNR data can be processed more effectively, thereby improving the detection ability of targets such as Lyman - α Emitters (LAEs). 3. **Expand the Classification Scale of the Dataset**: - The amount of data in HETDEX is very large, containing about 2 - 3 million sources. Traditional manual classification methods cannot handle such a large amount of data. By combining public science and machine learning, the paper expands the classification scale from the initial 14,000 to 190,000, and further plans to apply it to the entire 1.2 - million - LAE - candidate sample. 4. **Improve the Precision of Cosmological Parameter Measurements**: - Accurately identifying and classifying LAEs is crucial for studying the expansion rate of the universe. The goal of the paper is to more accurately measure the expansion rate of the universe by reducing false positives and increasing the identification rate of true positives, thereby enhancing the cosmological research results of HETDEX. 5. **Use Public Science for Education and Public Participation**: - In addition to scientific research, the paper also emphasizes the educational value of public science projects. Through the Dark Energy Explorers project, it has attracted about 17,000 volunteers from 159 countries around the world to participate, providing free educational opportunities while increasing the public's interest in and understanding of astronomy. ### Key Technologies and Methods - **Public Science Platform (Zooniverse)**: It is used to collect a large number of human classification results. Especially when dealing with large - scale data, human visual verification can significantly improve the accuracy of classification. - **Machine - Learning Algorithm (t - SNE)**: It is used for dimension reduction and visualization of high - dimensional data, helping to identify and classify similar spectral features, especially for removing false positives and artificial artifacts. - **Nearest Neighbors Method**: It is used to apply the classification results of public science to larger datasets to ensure the consistency and accuracy of classification. ### Conclusion By combining public science and machine learning, the paper shows how to effectively process large - scale astronomical datasets, improve classification accuracy, and provide more reliable data support for future cosmological research.