A Framework of Visual Checkout System Using Convolutional Neural Networks for Bento Buffet

Mei-Yi Wu,Jia-Hong Lee,Chuan-Ying Hsueh
DOI: https://doi.org/10.3390/s21082627
IF: 3.9
2021-04-08
Sensors
Abstract:In recent years, the technology of artificial intelligence (AI) and robots is rapidly spreading to countries around the world. More and more scholars and industry experts have proposed AI deep learning models and methods to solve human life problems and improve work efficiency. Modern people’s lives are very busy, which led us to investigate whether the demand for Bento buffet cafeterias has gradually increased in Taiwan. However, when eating at a buffet in a cafeteria, people often encounter two problems. The first problem is that customers need to queue up to check out after they have selected and filled their dishes from the buffet. However, it always takes too much time waiting, especially at lunch or dinner time. The second problem is sometimes customers question the charges calculated by cafeteria staff, claiming they are too expensive at the checkout counter. Therefore, it is necessary to develop an AI-enabled checkout system. The AI-enabled self-checkout system will help the Bento buffet cafeterias reduce long lineups without the need to add additional workers. In this paper, we used computer vision and deep-learning technology to design and implement an AI-enabled checkout system for Bento buffet cafeterias. The prototype contains an angle steel shelf, a Kinect camera, a light source, and a desktop computer. Six baseline convolutional neural networks were applied for comparison on food recognition. In our experiments, there were 22 different food categories in a Bento buffet cafeteria employed. Experimental results show that the inception_v4 model can achieve the highest average validation accuracy of 99.11% on food recognition, but it requires the most training and recognition time. AlexNet model achieves a 94.5% accuracy and requires the least training time and recognition time. We propose a hierarchical approach with two stages to achieve good performance in both the recognition accuracy rate and the required training and recognition time. The approach is designed to perform the first step of identification and the second step of recognizing similar food images, respectively. Experimental results show that the proposed approach can achieve a 96.3% accuracy rate on our test dataset and required very little recognition time for input images. In addition, food volumes could be estimated using the depth images captured by the Kinect camera, and a framework of visual checkout system was successfully built.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address two main issues encountered during the checkout process at Taiwanese bento buffet restaurants: 1. **Long Queue Times**: After selecting their food, customers need to wait in line to check out, especially during lunch and dinner peak hours, resulting in long queue times. 2. **Pricing Disputes**: Sometimes customers question the prices calculated by the restaurant staff, believing the prices to be too high. To solve these problems, the research team proposed a visual checkout system based on Convolutional Neural Networks (CNN). This system can automatically recognize different types of food on the buffet plate and estimate the food volume, thereby achieving a fast and accurate checkout process. In this way, it can reduce customer queue times and avoid pricing disputes. Specifically, the research team built a prototype system that includes angle steel shelves, Kinect cameras, light sources, and desktop computers. Six benchmark CNN models were used for comparison in the experiments, including AlexNet, VGG, ResNet, Inception, and DenseNet. The final results show that the proposed phased approach can significantly reduce the time required for training and recognition while ensuring high recognition accuracy. Additionally, the system can use depth images captured by the Kinect camera to estimate food volume, further optimizing the checkout process.