Abstract:Design generation requires tight integration of neural and symbolic reasoning, as good design must meet explicit user needs and honor implicit rules for aesthetics, utility, and convenience. Current automated design tools driven by neural networks produce appealing designs, but cannot satisfy user specifications and utility requirements. Symbolic reasoning tools, such as constraint programming, cannot perceive low-level visual information in images or capture subtle aspects such as aesthetics. We introduce the Spatial Reasoning Integrated Generator (SPRING) for design generation. SPRING embeds a neural and symbolic integrated spatial reasoning module inside the deep generative network. The spatial reasoning module decides the locations of objects to be generated in the form of bounding boxes, which are predicted by a recurrent neural network and filtered by symbolic constraint satisfaction. Embedding symbolic reasoning into neural generation guarantees that the output of SPRING satisfies user requirements. Furthermore, SPRING offers interpretability, allowing users to visualize and diagnose the generation process through the bounding boxes. SPRING is also adept at managing novel user specifications not encountered during its training, thanks to its proficiency in zero-shot constraint transfer. Quantitative evaluations and a human study reveal that SPRING outperforms baseline generative models, excelling in delivering high design quality and better meeting user specifications.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to combine neural networks and symbolic reasoning methods in design generation tasks to meet users' explicit needs and implicit rules. Specifically, although current neural - network - based design tools can generate attractive designs, they cannot fully meet users' specific requirements and functional needs; while methods based on symbolic reasoning such as constraint programming can handle explicit specifications, they cannot perceive low - level visual information in images and cannot capture subtle aspects such as aesthetics. Therefore, this paper proposes a new method - Spatial Reasoning Integrated Generator (SPRING), aiming to generate designs that meet user needs and have high aesthetic quality by combining the advantages of neural networks and symbolic reasoning.
### Problem Definition
The goal of the design generation task is to generate designs represented in the form of graphs, such as interior design. Specifically, given a background image \( B \) containing initial objects and a design specification \( D \), where \( D \) includes new objects \( O \) and position constraints \( C \), the goal is to find a scene image \( S \) such that \( S \) contains the background image \( B \) and new objects \( O \), and satisfies all position constraints \( C \), and at the same time \( S \) should look realistic, that is, visually close to the images in the natural image set \( T \).
### Design Language
The paper defines a propositional design language for describing spatial relationships and properties between objects. This language uses propositional logic, where objects are represented by \( o_1, o_2, \ldots, o_N \), and relationships and properties are defined through a series of predicates. For example:
- **Spatial Relationships**:
- `above(o1, o2, c)`: The top of object \( o_1 \) is at least \( c \) units higher than the top of object \( o_2 \).
- `below(o1, o2, c)`: The top of object \( o_1 \) is at least \( c \) units lower than the bottom of object \( o_2 \).
- `left(o1, o2, c)`: The left side of object \( o_1 \) is at least \( c \) units more to the left than the left side of object \( o_2 \).
- `right(o1, o2, c)`: The left side of object \( o_1 \) is at least \( c \) units more to the right than the right side of object \( o_2 \).
- **Properties**:
- `property(o1, "blue")`: Object \( o_1 \) is blue.
- `type(o1, "microwave")`: Object \( o_1 \) is a microwave.
### SPRING Model
The SPRING model consists of three modules:
1. **Perception Module**: Identifies and locates existing objects in the background image. Uses the pre - trained DETR50 object detection model and ResNet18 scene encoder.
2. **Spatial Reasoning Module**: Determines the spatial position of each object, using a method that combines recurrent neural networks (GRU) and symbolic reasoning. Generates the bounding boxes of objects through an iterative refinement process and uses symbolic reasoning to filter out invalid outputs.
3. **Visual Element Generation Module**: Generates image patches containing objects according to the background image, prompts, and layout, and seamlessly merges them into the background image.
### Key Contributions
1. **Ensuring the Satisfaction of User Needs**: By embedding symbolic reasoning into the neural generation model, SPRING can generate designs that meet user specifications.
2. **Interpretability**: SPRING has higher interpretability, and users can debug potential dissatisfaction by viewing the bounding boxes in the generation process.
3. **Zero - Shot Transfer Ability**: SPRING can handle new user specifications that do not appear in the training set without the need for retraining or fine - tuning.
Through these innovations, SPRING outperforms existing baseline models in generating high - quality designs that meet user needs.