Imitation Learning Inputting Image Feature to Each Layer of Neural Network

Koki Yamane,Sho Sakaino,Toshiaki Tsuji
2024-01-19
Abstract:Imitation learning enables robots to learn and replicate human behavior from training data. Recent advances in machine learning enable end-to-end learning approaches that directly process high-dimensional observation data, such as images. However, these approaches face a critical challenge when processing data from multiple modalities, inadvertently ignoring data with a lower correlation to the desired output, especially when using short sampling periods. This paper presents a useful method to address this challenge, which amplifies the influence of data with a relatively low correlation to the output by inputting the data into each neural network layer. The proposed approach effectively incorporates diverse data sources into the learning process. Through experiments using a simple pick-and-place operation with raw images and joint information as input, significant improvements in success rates are demonstrated even when dealing with data from short sampling periods.
Robotics,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that when using data from multiple modalities for imitation learning, the weakly correlated parts among the data are easily ignored. Specifically, when simultaneously inputting data from multiple modalities, the data with strong correlation to the output will be focused on by the model, while the data with weak correlation to the output may be neglected. Especially in the case of a short sampling period, this phenomenon is more obvious. For example, when predicting the next joint angle of a robot, if the current - step joint angle and the image are used as inputs, since the current - step joint angle has a strong correlation with the next - step joint angle, especially when the sampling period is very short, even if the current - step joint angle is directly used as the output, a small prediction error can be achieved, resulting in a very small influence of other inputs (such as the image) on the output. To solve this problem, the author proposes a method to increase the influence of these data on the output by inputting the weakly correlated data into each layer of the neural network. The experimental results show that this method can significantly improve the success rate of tasks when dealing with data with a short sampling period, especially for simple grasping and placing operations. When using the original image and joint information as inputs, the effect is particularly obvious.