Code Generation from Flowcharts with Texts: A Benchmark Dataset and an Approach.

Zejie Liu,Xiaoyu Hu,Deyu Zhou,Lin Li,Xu Zhang,Yanzheng Xiang
DOI: https://doi.org/10.18653/v1/2022.findings-emnlp.449
2022-01-01
Abstract:Currently, researchers focus on generating codes from the requirement documents.However, current approaches still perform poorly on some requirements needing complex problemsolving skills.In reality, to tackle such complex requirements, instead of directly translating requirement documents into codes, software engineers write codes via unified modeling language diagrams, such as flowcharts, an intermediate tool to analyze and visualize the system.Therefore, we propose a new source code generation task, that is, to generate source code from flowcharts with texts.We manually construct a benchmark dataset containing 320 flowcharts with their corresponding source codes 1 .Obviously, it is not straightforward to employ the current approaches for the new source code generation task since (1) the flowchart is a graph that contains various structures, including loop, selection, and others which is different from texts; (2) the connections between nodes in the flowchart are abundant and diverse which need to be carefully handled.To solve the above problems, we propose a two-stage code generation model.In the first stage, a structure recognition model is employed to transform the flowchart into pseudo-code containing the structural conventions of a typical programming language such as while, if.In the second stage, a code generation model is employed to convert the pseudo-code into code.Experimental results show that the proposed approach can achieve some improvement over the baselines.
What problem does this paper attempt to address?