Jointly Learning Grounded Task Structures from Language Instruction and Visual Demonstration.

Changsong Liu,Shaohua Yang,Sari Saba-Sadiya,Nishant Shukla,Yunzhong He,Song-Chun Zhu,Joyce Yue Chai
DOI: https://doi.org/10.18653/v1/d16-1155
2016-01-01
Abstract:To enable language-based communication and collaboration with cognitive robots, this paper presents an approach where an agent can learn task models jointly from language instruction and visual demonstration using an And-Or Graph (AoG) representation.The learned AoG captures a hierarchical task structure where linguistic labels (for language communication) are grounded to corresponding state changes from the physical environment (for perception and action).Our empirical results on a cloth-folding domain have shown that, although state detection through visual processing is full of uncertainties and error prone, by a tight integration with language the agent is able to learn an effective AoG for task representation.The learned AoG can be further applied to infer and interpret on-going actions from new visual demonstration using linguistic labels at different levels of granularity.
What problem does this paper attempt to address?