Decision Making of Mobile Robot based on Multimodal Fusion

Ya Hou,ZhiQuan Feng,Tao Xu
DOI: https://doi.org/10.1145/3379247.3379255
2020-01-04
Abstract:To solve the problem of multimodal information fusion, this paper proposes a method based on the filling of scene main components, and evaluates the channel information according to the knowledge base. The modal channel of this paper chooses vision and hearing, which is more suitable for the information transmission in the actual communication. Firstly, the single mode information is identified by the neural network. After processing, the image and audio expression are transformed into text expression, and the component value describing the scene is filled in according to text analysis. According to the knowledge base, the evaluation model is established to calculate the confidence of each channel when the information conflicts. After getting the scene, query the content of the prior knowledge base again and send the corresponding action instructions to the robot. The experimental results show that this paper can correct the modal fusion results under the guidance of prior knowledge base, and achieve effective human-computer cooperation in specific scenes. It reduces the dependence on single channel information in the interaction process, increases fault tolerance mechanism, and improves user experience evaluation.
What problem does this paper attempt to address?