FeatureForest: the power of foundation models, the usability of random forests
Mehdi Seifi,Damian Dalle Nogare,Juan Battagliotti,Vera Galinova,Ananya Kediga Rao,AI4Life Horizon Europe Programme Consortium,Johan Decelle,Florian Jug,Joran Deschamps
DOI: https://doi.org/10.1101/2024.12.12.628025
2024-12-16
Abstract:Once the work at the microscope is done, biological discoveries rely heavily on proper downstream analysis. This often amounts to first segmenting the biological objects of interest in the image before performing a quantitative analysis. Deep-learning (DL) is nowadays ubiquitous in such segmentation tasks. However, DL can be cumbersome to apply, as it often requires large amount of manual labeling to produce ground-truth data, and expert knowledge to train the models from scratch. Nonetheless, the performance of large foundation models, although trained on natural images, are improving on scientific images with every new model released. They, however, require either manual prompting or tedious post-processing to selectively segment the biological objects of interest. Classical machine learning algorithms, such as random forest classifiers, on the other hand, are well-established, easy to train, and often yield results of sufficient quality for downstream processing tasks, hence their continued popularity. Unfortunately, they are limited to objects with distinct, well-defined textures compared to their environment. This generally limits their usefulness to structures easy to recognize. Here, we present FeatureForest, an open-source tool that leverages the feature embeddings of large foundation models to train a random forest classifier, thereby providing users with a rapid way of semantically segmenting complex images using only a few labeling strokes. We demonstrate the improvement in performance over a variety of datasets, including large and complex volumetric electron microscopy stacks. Our implementation is available in napari, currently integrates four foundation models, and can easily be extended to any new model once they become available.
Biology