RoboChop: Autonomous Framework for Fruit and Vegetable Chopping Leveraging Foundational Models

Atharva Dikshit,Alison Bartsch,Abraham George,Amir Barati Farimani
2023-07-25
Abstract:With the goal of developing fully autonomous cooking robots, developing robust systems that can chop a wide variety of objects is important. Existing approaches focus primarily on the low-level dynamics of the cutting action, which overlooks some of the practical real-world challenges of implementing autonomous cutting systems. In this work we propose an autonomous framework to sequence together action primitives for the purpose of chopping fruits and vegetables on a cluttered cutting board. We present a novel technique to leverage vision foundational models SAM and YOLO to accurately detect, segment, and track fruits and vegetables as they visually change through the sequences of chops, finetuning YOLO on a novel dataset of whole and chopped fruits and vegetables. In our experiments, we demonstrate that our simple pipeline is able to reliably chop a variety of fruits and vegetables ranging in size, appearance, and texture, meeting a variety of chopping specifications, including fruit type, number of slices, and types of slices.
Robotics
What problem does this paper attempt to address?
The paper aims to address the challenges faced by autonomous cooking robots when cutting fruits and vegetables. Specifically, existing methods mainly focus on the low-level dynamics of the cutting action itself, while neglecting some practical challenges encountered when implementing an autonomous cutting system. This paper proposes an autonomous framework for cutting fruits and vegetables on a cluttered workbench. The framework combines the vision-based models SAM (Segment Anything Model) and YOLO (You Only Look Once) to accurately detect, segment, and track fruits and vegetables whose visual appearance changes during the cutting process. The researchers also fine-tuned YOLO on a novel dataset that includes both whole and sliced fruits and vegetables. Experimental validation shows that this simple pipeline can reliably cut fruits and vegetables of various sizes, appearances, and textures, meeting different cutting specifications, including fruit type, number of slices, and slice type. Additionally, the paper explores how to use YOLO and SAM models to achieve high-quality object masks in real-world robotic tasks, which is not limited to the cooking domain but can also be applied to other robotic tasks requiring precise object segmentation. Despite the high success rate in visual recognition and cutting actions, the system still has some limitations, such as assuming that cutting actions are always successful and using the workbench boundaries to limit the system's operational range. Future research should focus on improving the success detector for cutting actions, introducing tool-changing capabilities for grasping and placing objects, and optimizing SAM's inference time to handle a large number of objects.