Abstract: The ability of a machine to communicate with humans has long been associated with the general success of AI. This dates back to Alan Turing's epoch-making work in the early 1950s, which proposes that a machine's intelligence can be tested by how well it, the machine, can fool a human into believing that the machine is a human through dialogue conversations. Many systems learn generation rules from a minimal set of authored rules or labels on top of hand-coded rules or templates, and thus are both expensive and difficult to extend to open-domain scenarios. Recently, the emergence of neural network models the potential to solve many of the problems in dialogue learning that earlier systems cannot tackle: the end-to-end neural frameworks offer the promise of scalability and language-independence, together with the ability to track the dialogue state and then mapping between states and dialogue actions in a way not possible with conventional systems. On the other hand, neural systems bring about new challenges: they tend to output dull and generic responses; they lack a consistent or a coherent persona; they are usually optimized through single-turn conversations and are incapable of handling the long-term success of a conversation; and they are not able to take the advantage of the interactions with humans. This dissertation attempts to tackle these challenges: Contributions are two-fold: (1) we address new challenges presented by neural network models in open-domain dialogue generation systems; (2) we develop interactive question-answering dialogue systems by (a) giving the agent the ability to ask questions and (b) training a conversation agent through interactions with humans in an online fashion, where a bot improves through communicating with humans and learning from the mistakes that it makes.

Hey AI, Can You Solve Complex Tasks by Talking to Agents?

Teaching Machines to Converse

COMMA: A Communicative Multimodal Multi-Agent Benchmark

AutoAct: Automatic Agent Learning from Scratch for QA Via Self-Planning

ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos

AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

Answering Science Exam Questions Using Query Rewriting with Background Knowledge

KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base

Language Models can Solve Computer Tasks

Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks

Understanding Unnatural Questions Improves Reasoning over Text

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

DEXTER: A Benchmark for open-domain Complex Question Answering using LLMs

LONGAGENT: Achieving Question Answering for 128K-Token-long Documents Through Multi-Agent Collaboration

TuringQ: Benchmarking AI Comprehension in Theory of Computation

Addressing a Question Answering Challenge by Combining Statistical Methods with Inductive Rule Learning and Reasoning

Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA

WebQA: Multihop and Multimodal QA

Teaching Smaller Language Models To Generalise To Unseen Compositional Questions

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

In-Context Ability Transfer for Question Decomposition in Complex QA