Abstract:In recent years, the technology of artificial intelligence (AI) and robots is rapidly spreading to countries around the world. More and more scholars and industry experts have proposed AI deep learning models and methods to solve human life problems and improve work efficiency. Modern people’s lives are very busy, which led us to investigate whether the demand for Bento buffet cafeterias has gradually increased in Taiwan. However, when eating at a buffet in a cafeteria, people often encounter two problems. The first problem is that customers need to queue up to check out after they have selected and filled their dishes from the buffet. However, it always takes too much time waiting, especially at lunch or dinner time. The second problem is sometimes customers question the charges calculated by cafeteria staff, claiming they are too expensive at the checkout counter. Therefore, it is necessary to develop an AI-enabled checkout system. The AI-enabled self-checkout system will help the Bento buffet cafeterias reduce long lineups without the need to add additional workers. In this paper, we used computer vision and deep-learning technology to design and implement an AI-enabled checkout system for Bento buffet cafeterias. The prototype contains an angle steel shelf, a Kinect camera, a light source, and a desktop computer. Six baseline convolutional neural networks were applied for comparison on food recognition. In our experiments, there were 22 different food categories in a Bento buffet cafeteria employed. Experimental results show that the inception_v4 model can achieve the highest average validation accuracy of 99.11% on food recognition, but it requires the most training and recognition time. AlexNet model achieves a 94.5% accuracy and requires the least training time and recognition time. We propose a hierarchical approach with two stages to achieve good performance in both the recognition accuracy rate and the required training and recognition time. The approach is designed to perform the first step of identification and the second step of recognizing similar food images, respectively. Experimental results show that the proposed approach can achieve a 96.3% accuracy rate on our test dataset and required very little recognition time for input images. In addition, food volumes could be estimated using the depth images captured by the Kinect camera, and a framework of visual checkout system was successfully built.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address two main issues encountered during the checkout process at Taiwanese bento buffet restaurants: 1. **Long Queue Times**: After selecting their food, customers need to wait in line to check out, especially during lunch and dinner peak hours, resulting in long queue times. 2. **Pricing Disputes**: Sometimes customers question the prices calculated by the restaurant staff, believing the prices to be too high. To solve these problems, the research team proposed a visual checkout system based on Convolutional Neural Networks (CNN). This system can automatically recognize different types of food on the buffet plate and estimate the food volume, thereby achieving a fast and accurate checkout process. In this way, it can reduce customer queue times and avoid pricing disputes. Specifically, the research team built a prototype system that includes angle steel shelves, Kinect cameras, light sources, and desktop computers. Six benchmark CNN models were used for comparison in the experiments, including AlexNet, VGG, ResNet, Inception, and DenseNet. The final results show that the proposed phased approach can significantly reduce the time required for training and recognition while ensuring high recognition accuracy. Additionally, the system can use depth images captured by the Kinect camera to estimate food volume, further optimizing the checkout process.

A Framework of Visual Checkout System Using Convolutional Neural Networks for Bento Buffet

Vision-based food handling system for high-resemblance random food items

DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment

Recognition of Chinese food using convolutional neural network

Object detection and recognition system based on computer vision analysis

An Intelligent Vision-Based Nutritional Assessment Method for Handheld Food Items

An Edge Computing Visual System for Vegetable Categorization

Machine Learning Based Approach on Food Recognition and Nutrition Estimation

Designing a Supermarket Service Robot Based on Deep Convolutional Neural Networks

A Food Package Recognition and Sorting System Based on Structured Light and Deep Learning

Automated Food Weight and Content Estimation Using Computer Vision and AI Algorithms

An Optimized Recurrent Neural Network for re-modernize food dining bowls and estimating food capacity from images

Grab, Pay and Eat: Semantic Food Detection for Smart Restaurants

Enhanced Self-Checkout System for Retail Based on Improved YOLOv10

ARC: A Vision-based Automatic Retail Checkout System

A Novel Method for Accurate & Real-time Food Classification: The Synergistic Integration of EfficientNetB7, CBAM, Transfer Learning, and Data Augmentation

Multi-Task Learning for Food Identification and Analysis with Deep Convolutional Neural Networks

Restaurant Interior Design under Digital Image Processing Based on Visual Sensing Technology

Automatic Chinese Food recognition based on a stacking fusion model

Computer Vision in the Food Industry: Accurate, Real-time, and Automatic Food Recognition with Pretrained MobileNetV2

Batch Normalization Free Rigorous Feature Flow Neural Network for Grocery Product Recognition