Abstract:Traditional search systems focus on query formulation for effective results but face challenges in scenarios such as product searches where crucial product details (e.g., size, color) remain concealed until users visit specific product pages. This highlights the need for intelligent web navigation agents capable of formulating queries and navigating web pages according to users' high-level intents. In response to this need, this work introduces a Grounded Language Agent for Intelligent Web Interactions, called GLAINTEL. Drawing upon advancements in language modeling and reinforcement learning, GLAINTEL investigates the efficacy of transformer-based models in enhancing the search capabilities of interactive web environments. Given the dynamic action space for each state in web navigation, GLAINTEL employs the Flan-T5 architecture and incorporates language modeling and value estimation heads. This work focuses on training smaller language models as agents across various scenarios, systematically evaluating the impact of human demonstrations on the training process. Specifically, we investigate scenarios where no human demonstrations are available and subsequently assess the effective utilization of such demonstrations. We also explore unsupervised domain adaptation for situations where demonstrations are confined to a specific domain. Experimental evaluations across diverse setups demonstrate the effectiveness of training agents in unsupervised settings, outperforming in-context learning-based approaches that employ larger models with up to 540 billion parameters. Surprisingly, behavioral cloning-based methods that straightforwardly use human demonstrations do not outperform unsupervised learning-based methods. Additionally, combining human demonstrations with Reinforcement Learning-based training yields results comparable to models utilizing GPT-4.

BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning

BabyAI++: Towards Grounded-Language Learning beyond Memorization.

Teaching Machines to Converse

Dialogue Learning with Human-in-the-Loop.

Zero-Shot Compositional Policy Learning via Language Grounding

ELLA: Exploration through Learned Language Abstraction

Interactive Teaching for Conversational AI

Using NLU in Context for Question Answering: Improving on Facebook's bAbI Tasks

What Artificial Neural Networks Can Tell Us About Human Language Acquisition

Interactive Grounded Language Acquisition and Generalization in a 2D World

Understanding Early Word Learning in Situated Artificial Agents

BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models

Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner

Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models

Search Beyond Queries: Training Smaller Language Models for Web Interactions via Reinforcement Learning

NLU for Game-based Learning in Real: Initial Evaluations

Learning to Model the World with Language

Listen, Interact and Talk: Learning to Speak via Interaction

Simulating User Agents for Embodied Conversational-AI

Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback

Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning