JetTrain: IDE-Native Machine Learning Experiments

Artem Trofimov,Mikhail Kostyukov,Sergei Ugdyzhekov,Natalia Ponomareva,Igor Naumov,Maksim Melekhovets
DOI: https://doi.org/10.1145/3643796.3648455
2024-02-17
Abstract:Integrated development environments (IDEs) are prevalent code-writing and debugging tools. However, they have yet to be widely adopted for launching machine learning (ML) experiments. This work aims to fill this gap by introducing JetTrain, an IDE-integrated tool that delegates specific tasks from an IDE to remote computational resources. A user can write and debug code locally and then seamlessly run it remotely using on-demand hardware. We argue that this approach can lower the entry barrier for ML training problems and increase experiment throughput.
Software Engineering,Machine Learning
What problem does this paper attempt to address?
This paper proposes a solution to the problem of conducting machine learning (ML) experiments in integrated development environments (IDEs). Currently, ML engineers often need to switch between different tools and hardware resources, which leads to complexity and decreased efficiency. To address this issue, the paper introduces JetTrain, a tool that integrates ML experiment functionality directly into IDEs. JetTrain allows users to write and debug code locally and then seamlessly run it on remote hardware, utilizing on-demand allocated resources. The main goal of the paper is to reduce the entry barrier for ML training for users familiar with IDEs and mitigate the negative impacts of context switching. The authors analyze existing ML experiment startup interfaces such as SSH, Jupyter Notebook, pipeline tools, and task scheduling tools, highlighting their respective advantages and limitations. They identify a gap between complex tools and a simple interface, and JetTrain aims to fill this gap by providing a user-friendly experience while integrating mature scheduling tools in the background to achieve efficient hardware utilization and reproducibility. The workflow of JetTrain includes opening a project in the IDE, running and debugging code locally, writing experiment commands, selecting appropriate hardware settings, mounting remote data (if needed), launching experiments on remote hardware, and optionally conducting debugging and terminal connections. To achieve this goal, the paper discusses several challenges such as code and data synchronization, experiment reproducibility, and asynchronous debugging, and proposes corresponding solutions. In summary, the paper aims to simplify the process of ML experiments through JetTrain, making the development process more convenient and efficient while maintaining tight integration with IDEs.