Abstract:The incorporation of physical information in machine learning frameworks is opening and transforming many application domains. Here the learning process is augmented through the induction of fundamental knowledge and governing physical laws. In this work, we explore their utility for computer vision tasks in interpreting and understanding visual data. We present a systematic literature review of more than 250 papers on formulation and approaches to computer vision tasks guided by physical laws. We begin by decomposing the popular computer vision pipeline into a taxonomy of stages and investigate approaches to incorporate governing physical equations in each stage. Existing approaches in computer vision tasks are analyzed with regard to what governing physical processes are modeled and formulated, and how they are incorporated, i.e. modification of input data (observation bias), modification of network architectures (inductive bias), and modification of training losses (learning bias). The taxonomy offers a unified view of the application of the physics-informed capability, highlighting where physics-informed learning has been conducted and where the gaps and opportunities are. Finally, we highlight open problems and challenges to inform future research. While still in its early days, the study of physics-informed computer vision has the promise to develop better computer vision models that can improve physical plausibility, accuracy, data efficiency, and generalization in increasingly realistic applications.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is that current computer vision models lack consideration of physical laws when processing complex visual data. As a result, although these models perform excellently in some tasks, they are deficient in terms of robustness, interpretability, and consistency with physical laws. Specifically, the paper focuses on how to enhance the performance of models in computer vision tasks by introducing physical knowledge, and improve the physical rationality, accuracy, data efficiency, and generalization ability of the models. By systematically reviewing more than 250 related literatures, the paper explores how to incorporate physical laws in different stages of computer vision (data acquisition, pre - processing, model design, training, and inference), and proposes a unified taxonomy to analyze existing methods and techniques, and points out the directions and challenges for future research.
### Main contributions of the paper
1. **Proposing a unified taxonomy**: The paper proposes a unified taxonomy for studying which physical knowledge/processes are modeled in computer vision models, how this knowledge is represented, and how it is integrated into computer vision models.
2. **In - depth exploration of multiple computer vision tasks**: The paper deeply explores a variety of computer vision tasks, from imaging, super - resolution, generation, prediction, and image reconstruction to image classification, object detection, image segmentation, and human body analysis.
3. **Detailed review of methods for integrating physical information**: In each task, the paper carefully examines how physical information is integrated into specific computer vision algorithms, which physical processes are modeled and integrated, and which network architectures or network enhancement techniques are used to fuse physical information. At the same time, it also analyzes the context and data sets in these tasks.
4. **Summarizing challenges and future research directions**: Based on the review of tasks, the paper summarizes the challenges, open research questions, and future research directions.
### Methods for integrating physical information
The paper discusses three main strategies for integrating physical knowledge/prior information into machine - learning models:
1. **Observation bias**: Utilize multi - modal data that reflects the basic physical principles that generate them. Deep neural networks are directly exposed to training/observation data and capture the underlying physical processes through training.
2. **Learning bias**: Enforce prior knowledge/physical information through soft constraints. Such methods enhance the loss function by adding additional terms based on the underlying physical processes in the loss function, such as momentum, mass conservation, etc.
3. **Inductive bias**: Introduce "hard" constraints through custom - made neural networks. For example, Hamiltonian neural networks (Hamiltonian NN) encode better inductive biases, draw inspiration from Hamiltonian mechanics, and train the model to respect exact conservation laws.
### Application examples
- **Super - resolution task**: By embedding partial differential equations (PDE) in the loss function, the physics - informed neural network (PINN) can generate high - resolution flow fields without high - resolution data.
- **Crowd motion analysis**: By introducing physics - based constraints, such as low entropy and unit order, the model can more accurately identify ordered and disordered crowd motions.
- **Human analysis**: Utilize prior knowledge of human body structure (such as arms, heads, and legs are connected to the torso) and anatomical joint limitations to ensure that the solution conforms to the physical rationality of human body structure and movement.
### Future research directions
- **Selecting appropriate physical priors**: How to select appropriate physical priors is an important issue that requires further research.
- **Developing a standard benchmark platform**: Currently, there is a lack of a standard benchmark platform to evaluate the performance of different methods.
- **Tasks that do not fully utilize physical priors**: There is still a great deal of research space in fields such as human body tracking, object detection, and video analysis.
In conclusion, through systematically reviewing and analyzing existing literatures, this paper provides a comprehensive perspective on the application of physical information in computer vision, and points out the current research deficiencies and potential future directions.