Behaviour Learning with Adaptive Motif Discovery and Interacting Multiple Model

Hanging Zhao,Travis Manderson,Hao Zhang,Xue Liu,Gregory Dudek
DOI: https://doi.org/10.1109/iros47612.2022.9981588
2022-01-01
Abstract:We propose an approach that enables simultaneous interpretable learning of a high-level discrete behaviour and its low-level rhythmic sub-behaviour. We do this though a unified reward function, where a reward function that only describes low-level behaviour, with less impact on learning of other behaviours is recovered from few-shot motion demonstrations. To this end, we first extract local behaviour motifs from state-only human demonstrations and random driving samples using an adaptive motif discovery approach derived from the Matrix Profile algorithm. We then optimize parameters for motif discovery by maximizing the sum and entropy over motif sizes. Interacting Multiple Model (IMM) estimators are constructed on top of linear-Gaussian dynamics of discovered motifs, the cumulative distributions over motifs estimated by IMMs serve as the basis of the reward function. By combining the recovered reward with the terrain type signal gathered from the environment, we are able to train a dual-objective off-road vehicle controller that demonstrates both terrain selection and human-like driving behaviours. Compared with related approaches across 10 people, our rhythmic behaviour reward recovery approach enables the controller to produce higher preference over human driving demonstrations. In addition to performing more stable across different people with 87% less variance than the best baseline in rhythmic behaviour indicator, our method reduces the negative effects on higher-level behaviour learning while maintaining high interpretability at all stages of the algorithm.
What problem does this paper attempt to address?