Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri,Ximing Lu,Melanie Sclar,Xiang Lorraine Li,Liwei Jiang,Bill Yuchen Lin,Peter West,Chandra Bhagavatula,Ronan Le Bras,Jena D. Hwang,Soumya Sanyal,Sean Welleck,Xiang Ren,Allyson Ettinger,Zaid Harchaoui,Yejin Choi
2023-11-01
Abstract:Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify transformer LLMs, we investigate the limits of these models across three representative compositional tasks -- multi-digit multiplication, logic grid puzzles, and a classic dynamic programming problem. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. We formulate compositional tasks as computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily developing systematic problem-solving skills. To round off our empirical study, we provide theoretical arguments on abstract multi-step reasoning problems that highlight how autoregressive generations' performance can rapidly decay with\,increased\,task\,complexity.
Computation and Language,Artificial Intelligence,Machine Learning