Natural Language Commanding via Program Synthesis

Apurva Gandhi,Thong Q. Nguyen,Huitian Jiao,Robert Steen,Ameya Bhatawdekar
2023-06-06
Abstract:We present Semantic Interpreter, a natural language-friendly AI system for productivity software such as Microsoft Office that leverages large language models (LLMs) to execute user intent across application features. While LLMs are excellent at understanding user intent expressed as natural language, they are not sufficient for fulfilling application-specific user intent that requires more than text-to-text transformations. We therefore introduce the Office Domain Specific Language (ODSL), a concise, high-level language specialized for performing actions in and interacting with entities in Office applications. Semantic Interpreter leverages an Analysis-Retrieval prompt construction method with LLMs for program synthesis, translating natural language user utterances to ODSL programs that can be transpiled to application APIs and then executed. We focus our discussion primarily on a research exploration for Microsoft PowerPoint.
Machine Learning,Computation and Language,Human-Computer Interaction
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use large - language models (LLMs) to build a natural - language command interface, enabling users to interact with productivity software (such as Microsoft Office) through natural language, thereby executing users' intentions. Specifically, the paper proposes a system named Semantic Interpreter, which can understand users' natural - language instructions and convert them into domain - specific program code (ODSL), and then execute these instructions through the application's API. This process aims to overcome the limitations of LLMs in directly executing application - specific functions while maintaining the ability to understand users' intentions. Semantic Interpreter mainly focuses on the following challenges: 1. **Limitations of LLMs**: Although LLMs are good at understanding users' intentions expressed in natural language, they cannot directly execute application - specific tasks that require more than text - to - text conversion (for example, "Create a new slide", "Insert a poem about hummingbirds in a blue rectangle", etc.). This is because LLMs are mainly text - based models and lack the ability to directly operate applications. 2. **Errors in program synthesis**: When using LLMs for program synthesis, the hallucination phenomenon is likely to occur, that is, the generated programs may be inaccurate or inconsistent with the provided prompts. This is especially obvious when generating action plans in DSL or API, which may lead to semantic errors, compile - time errors or run - time errors. 3. **Challenges in evaluating the system**: The evaluation of natural - language command systems is also a difficult problem, because users' queries are often abstract and not clearly specified, and there are multiple valid interpretations. For example, for the user request of "Make the slide look more beautiful", there is no single correct interpretation, and this need can be met in multiple ways such as adding pictures, animations or changing the text format. To address these challenges, the paper makes several key contributions: - Designing the Office Domain Specific Language (ODSL), which is a high - level, LLM - friendly language specifically used to perform operations in Office applications. - Describing the architecture of Semantic Interpreter, which uses the analysis - retrieval prompt engineering framework combined with LLMs to translate natural - language user queries into ODSL programs that can be parsed and executed by Office applications. - Proposing a method for evaluating programs of natural - language command systems by analyzing program equivalence to evaluate the performance of the system. Overall, the goal of the paper is to improve the efficiency and convenience of users' interaction with productivity software by combining the powerful natural - language processing capabilities of LLMs and domain - specific program design, so that even non - professional users can complete complex operations through simple natural - language instructions.