Abstract:As new research on Large Language Models (LLMs) continues, it is difficult to keep up with new research and models. To help researchers synthesize the new research many have written survey papers, but even those have become numerous. In this paper, we develop a method to automatically assign survey papers to a taxonomy. We collect the metadata of 144 LLM survey papers and explore three paradigms to classify papers within the taxonomy. Our work indicates that leveraging graph structure information on co-category graphs can significantly outperform the language models in two paradigms; pre-trained language models' fine-tuning and zero-shot/few-shot classifications using LLMs. We find that our model surpasses an average human recognition level and that fine-tuning LLMs using weak labels generated by a smaller model, such as the GCN in this study, can be more effective than using ground-truth labels, revealing the potential of weak-to-strong generalization in the taxonomy classification task.

What problem does this paper attempt to address?

The paper primarily addresses the challenges brought about by the rapid increase in the number of survey papers in the field of Large Language Models (LLMs) by proposing an automated classification method. Specifically, the core issues addressed by the paper are: - **Problem Background**: With the rapid development of research on large language models, a large number of survey papers have emerged. These papers are very important for newcomers to understand and grasp the progress in the field, but at the same time, it is difficult to filter out the most relevant materials due to their sheer volume. - **Research Objective**: To develop a method that can automatically classify these survey papers according to a classification system, helping researchers, especially beginners, quickly find review literature most relevant to their research direction. - **Main Contributions**: - Collected and analyzed 144 survey papers on large language models and their metadata, and designed a new classification system. - Proposed a graph representation learning-based method to classify these survey papers. Experiments show that this method performs well on small-scale and imbalanced datasets, effectively distinguishing papers with high text similarity. - Verified that graph representation learning outperforms methods based solely on language models in two paradigms (fine-tuning pre-trained language models and zero-shot/few-shot classification). - Explored the possibility of fine-tuning stronger models using labels generated by weaker models (weak labels). Experimental results show that this approach can even surpass the performance of using real labels. Through the above work, the paper aims to lower the entry barrier for newcomers to the field and improve their efficiency in finding relevant review literature.

Understanding Survey Paper Taxonomy about Large Language Models via Graph Representation Learning

A Survey of Large Language Models for Graphs

A Survey of Graph Meets Large Language Model: Progress and Future Directions

Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods

Large Language Models on Graphs: A Comprehensive Survey

A Survey of Large Language Models on Generative Graph Analytics: Query, Learning, and Applications

Large Language Models for Data Annotation: A Survey

Advancing Graph Representation Learning with Large Language Models: A Comprehensive Survey of Techniques

A survey on large language models for recommendation

Large Language Models and Knowledge Graphs: Opportunities and Challenges

Are Large Language Models a Good Replacement of Taxonomies?

Large Language Models Meet NLP: A Survey

Efficient Large Language Models: A Survey

Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey

Large Language Models for Data Annotation and Synthesis: A Survey

Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications

Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark

Graph Learning and Its Advancements on Large Language Models: A Holistic Survey