A data-driven active learning approach to reusing ML solutions in scientific applications

Hamideh Hajiabadi,Christopher Gerking,Lennart Hilbert,Anne Koziolek
DOI: https://doi.org/10.1016/j.jss.2024.111986
IF: 3.5
2024-05-01
Journal of Systems and Software
Abstract:Artificial intelligence can revolutionize scientific projects, but scientists face challenges in reusing, integrating, and deploying cost-effective and high-quality machine learning solutions. Determining suitable algorithms and parameters is difficult, especially for non-programmer scientists. Some algorithms, like deep learning-based methods, offer flexibility but require extensive training on annotated data. This poses a hurdle in labor-intensive tasks like biological image segmentation that relies on expert annotations. In this paper, we present a data-driven framework designed to assist scientists in selecting, reusing, and training machine learning solutions for microscopy image segmentation. The framework is based on establishing a mapping between object morphology features and the optimal segmentation algorithms and settings for individual objects. This mapping is iteratively refined through a combination of unsupervised learning and active learning iterations. To expedite convergence, objects are initially clustered based on their morphology. In each active learning iteration, the most informative and uncertain samples are selected and queried within a specific cluster. Through a biological case study, we demonstrate that our method enables the selection and training of segmentation algorithms specific to object types. Additionally, the selective requests for user input significantly reduce the number of user interactions required for this task.
computer science, theory & methods, software engineering
What problem does this paper attempt to address?