TechGPT-2.0: A large language model project to solve the task of knowledge graph construction

Jiaqi Wang,Yuying Chang,Zhong Li,Ning An,Qi Ma,Lei Hei,Haibo Luo,Yifei Lu,Feiliang Ren
2024-01-09
Abstract:Large language models have exhibited robust performance across diverse natural language processing tasks. This report introduces TechGPT-2.0, a project designed to enhance the capabilities of large language models specifically in knowledge graph construction tasks, including named entity recognition (NER) and relationship triple extraction (RTE) tasks in NLP applications. Additionally, it serves as a LLM accessible for research within the Chinese open-source model community. We offer two 7B large language model weights and a QLoRA weight specialized for processing lengthy texts.Notably, TechGPT-2.0 is trained on Huawei's Ascend server. Inheriting all functionalities from TechGPT-1.0, it exhibits robust text processing capabilities, particularly in the domains of medicine and law. Furthermore, we introduce new capabilities to the model, enabling it to process texts in various domains such as geographical areas, transportation, organizations, literary works, biology, natural sciences, astronomical objects, and architecture. These enhancements also fortified the model's adeptness in handling hallucinations, unanswerable queries, and lengthy texts. This report provides a comprehensive and detailed introduction to the full fine-tuning process on Huawei's Ascend servers, encompassing experiences in Ascend server debugging, instruction fine-tuning data processing, and model training. Our code is available at
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The main goal of this paper is to enhance the capabilities of large language models in knowledge graph construction tasks through the TechGPT-2.0 project, particularly in natural language processing applications such as Named Entity Recognition (NER) and Relation Triplet Extraction (RTE). Additionally, the project aims to develop a large language model available for the Chinese open-source community. Specifically: - **Enhancing Capabilities**: Improve the model's performance in NER and RTE tasks, especially in fields like medicine and law. - **Dataset Construction**: Build datasets that include NER and RTE sub-tasks, ensuring the quality and diversity of the datasets. - **Long Text Processing**: Introduce model weights optimized specifically for long text processing (e.g., QLoRA) to enhance the model's ability to handle long texts. - **Technical Sharing**: Provide a detailed account of the experience using Huawei Ascend servers for model training, including the debugging process, data preprocessing methods, and training techniques, to serve as a reference for other researchers. In summary, TechGPT-2.0 aims to improve the performance of large language models in the field of knowledge graph construction by refining model architecture and training methods, and to share practical experiences to promote the development of related research.