Abstract:We introduce DeepSets Operator Networks (DeepOSets), an efficient, non-autoregressive neural network architecture for in-context operator learning. In-context learning allows a trained machine learning model to learn from a user prompt without further training. DeepOSets adds in-context learning capabilities to Deep Operator Networks (DeepONets) by combining it with the DeepSets architecture. As the first non-autoregressive model for in-context operator learning, DeepOSets allow the user prompt to be processed in parallel, leading to significant computational savings. Here, we present the application of DeepOSets in the problem of learning supervised learning algorithms, which are operators mapping a finite-dimensional space of labeled data into an infinite-dimensional hypothesis space of prediction functions. In an empirical comparison with a popular autoregressive (transformer-based) model for in-context learning of the least-squares linear regression algorithm, DeepOSets reduced the number of model weights by several orders of magnitude and required a fraction of training and inference time. Furthermore, DeepOSets proved to be less sensitive to noise, outperforming the transformer model in noisy settings.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the context of supervised learning operators, how to efficiently perform non - autoregressive in - context learning (ICL). Specifically, the authors propose the DeepOSets model, aiming to improve the in - context learning ability of supervised learning operators by combining the DeepSets and DeepONets architectures. This method can process the data in user prompts in parallel, thereby significantly reducing the demand for computational resources and performing better in noisy environments.
### Core Problems of the Paper
1. **Efficiency Problem of In - context Learning**
- Existing autoregressive models (such as Transformer - based models) need to process data points one by one in sequence when dealing with in - context learning, resulting in high computational complexity, especially in the inference stage.
- DeepOSets reduces the time complexity of training and inference by processing input data in parallel.
2. **Optimization of the Number of Parameters and Computational Resources**
- Deep learning models usually have a large number of parameters, which not only increases the computational burden but also makes it difficult to deploy the model in resource - constrained environments.
- DeepOSets significantly reduces the number of model parameters. For example, in the experiment, DeepOSets achieves similar or even better performance with only 72K parameters compared to a Transformer model with 22M parameters.
3. **Robustness to Noise**
- In practical applications, input data often contains noise, which challenges the generalization ability and accuracy of the model.
- DeepOSets shows stronger noise - resistance ability and performs better than the Transformer model in noisy environments.
### Solutions
- **DeepOSets Architecture**
- It combines the advantages of DeepSets and DeepONets, can handle different numbers of context examples, and is invariant to the permutation of input data.
- The DeepSets module ensures that the model can accept a variable number of context examples and maintain permutation invariance.
- **Linear Time Complexity**
- DeepOSets has a linear time complexity \(O(n)\) during inference, and when the context examples are fixed, the time complexity for predicting new query points is a constant \(O(1)\), which is far better than the quadratic time complexity \(O(n^{2})\) of Transformer.
### Experimental Results
- **Low - Dimensional and High - Dimensional Linear Regression Tasks**
- DeepOSets performs excellently in low - dimensional linear regression tasks, especially in noisy environments, where its test mean - squared error (MSE) is an order of magnitude lower than that of the Transformer model.
- In high - dimensional linear regression tasks, although the accuracy of DeepOSets is slightly lower than that of Transformer, it still shows better robustness in noisy environments.
- **Training and Inference Speed**
- The training time of DeepOSets is only 9 minutes, while the Transformer model requires 3 hours.
- In terms of inference time, DeepOSets only needs 0.087 milliseconds per query, while the Transformer model needs 7.11 milliseconds.
In conclusion, this paper solves the problems of low efficiency, large number of parameters, and poor noise - resistance ability in in - context learning of existing autoregressive models by proposing the DeepOSets model, providing a more efficient and robust solution.