Abstract:Learning with a limited number of labeled data is a central problem in real-world applications of machine learning, as it is often expensive to obtain annotations. To deal with the scarcity of labeled data, transfer learning is a conventional approach; it suggests to learn a transferable knowledge by training a neural network from multiple other sources. In this paper, we investigate transfer learning of tabular tasks, which has been less studied and successful in the literature, compared to other domains, e.g., vision and language. This is because tables are inherently heterogeneous, i.e., they contain different columns and feature spaces, making transfer learning difficult. On the other hand, recent advances in natural language processing suggest that the label scarcity issue can be mitigated by utilizing in-context learning capability of large language models (LLMs). Inspired by this and the fact that LLMs can also process tables within a unified language space, we ask whether LLMs can be effective for tabular transfer learning, in particular, under the scenarios where the source and target datasets are of different format. As a positive answer, we propose a novel tabular transfer learning framework, coined Prompt to Transfer (P2T), that utilizes unlabeled (or heterogeneous) source data with LLMs. Specifically, P2T identifies a column feature in a source dataset that is strongly correlated with a target task feature to create examples relevant to the target task, thus creating pseudo-demonstrations for prompts. Experimental results demonstrate that P2T outperforms previous methods on various tabular learning benchmarks, showing good promise for the important, yet underexplored tabular transfer learning problem. Code is available at <a class="link-external link-https" href="https://github.com/jaehyun513/P2T" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address a core issue in the field of machine learning: how to effectively learn with limited labeled data. Specifically, the paper focuses on the problem of transfer learning for tabular data, as tabular data typically contains different columns and feature spaces, making traditional transfer learning methods difficult to apply. ### Main Contributions 1. **Proposing the P2T Framework**: - **Background**: Although there has been extensive research on transfer learning in fields such as vision and language, studies on tabular data are relatively scarce. The heterogeneity of tabular data (different columns and feature spaces) makes traditional transfer learning methods hard to apply directly. - **Solution**: The paper proposes a new framework called "Prompt to Transfer" (P2T), which leverages the contextual learning capabilities of large language models (LLMs) to address the transfer learning problem for tabular data. P2T identifies the most relevant column features from the source dataset for the target task and creates pseudo-demonstrations, thereby effectively transferring knowledge to the target task. 2. **Experimental Validation**: - **Zero-Shot Learning**: In zero-shot learning scenarios, P2T significantly improves prediction performance, especially when using unlabeled or heterogeneous data as the transfer source. - **Few-Shot Learning**: In few-shot learning scenarios, P2T also performs excellently, further enhancing model prediction accuracy by utilizing unlabeled or heterogeneous data. 3. **Comparative Experiments**: - Compared to other existing methods (such as self-supervised learning methods, unsupervised meta-learning methods, etc.), P2T achieves better results in various benchmark tests, particularly in leveraging unlabeled and heterogeneous data. ### Conclusion By proposing the P2T framework, this paper addresses key issues in transfer learning for tabular data and demonstrates the potential of large language models in scenarios with limited labeled data. Experimental results show that P2T performs excellently in various scenarios, providing new insights for transfer learning in tabular data.

Tabular Transfer Learning via Prompting LLMs

Large Scale Transfer Learning for Tabular Data via Language Modeling

An Automatic Prompt Generation System for Tabular Data Tasks

Trompt: Towards a Better Deep Neural Network for Tabular Data

TTNet: Tabular Transfer Network for Few-samples Prediction

PTab: Using the Pre-trained Language Model for Modeling Tabular Data

Transfer Learning with Deep Tabular Models

Making Pre-trained Language Models Great on Tabular Prediction

Towards Foundation Models for Learning on Tabular Data

CARTE: Pretraining and Transfer for Tabular Learning

Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science

TablEye: Seeing small Tables through the Lens of Images

UniTabE: A Universal Pretraining Protocol for Tabular Foundation Model in Data Science

Improving LLM Group Fairness on Tabular Data via In-Context Learning

TabDPT: Scaling Tabular Foundation Models

TALENT: A Tabular Analytics and Learning Toolbox

Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning

EPIC: Effective Prompting for Imbalanced-Class Data Synthesis in Tabular Data Classification via Large Language Models

A Survey on Deep Tabular Learning

Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning

From Supervised to Generative: A Novel Paradigm for Tabular Deep Learning with Large Language Models