Oracle Bone Inscriptions Multi-modal Dataset

Bang Li,Donghao Luo,Yujie Liang,Jing Yang,Zengmao Ding,Xu Peng,Boyuan Jiang,Shengwei Han,Dan Sui,Peichao Qin,Pian Wu,Chaoyang Wang,Yun Qi,Taisong Jin,Chengjie Wang,Xiaoming Huang,Zhan Shu,Rongrong Ji,Yongge Liu,Yunsheng Wu

2024-07-04

Abstract:Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can prove extremely challenging. Out of the 4,500 oracle bone characters excavated, only a third have been successfully identified. Therefore, leveraging the advantages of advanced AI technology to assist in the decipherment of OBI is a highly essential research topic. However, fully utilizing AI's capabilities in these matters is reliant on having a comprehensive and high-quality annotated OBI dataset at hand whereas most existing datasets are only annotated in just a single or a few dimensions, limiting the value of their potential application. For instance, the Oracle-MNIST dataset only offers 30k images classified into 10 categories. Therefore, this paper proposes an Oracle Bone Inscriptions Multi-modal Dataset(OBIMD), which includes annotation information for 10,077 pieces of oracle bones. Each piece has two modalities: pixel-level aligned rubbings and facsimiles. The dataset annotates the detection boxes, character categories, transcriptions, corresponding inscription groups, and reading sequences in the groups of each oracle bone character, providing a comprehensive and high-quality level of annotations. This dataset can be used for a variety of AI-related research tasks relevant to the field of OBI, such as OBI Character Detection and Recognition, Rubbing Denoising, Character Matching, Character Generation, Reading Sequence Prediction, Missing Characters Completion task and so on. We believe that the creation and publication of a dataset like this will help significantly advance the application of AI algorithms in the field of OBI research.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the application of artificial intelligence in the study of Oracle Bone Inscriptions (OBI). Currently, the interpretation of Oracle Bone Inscriptions is extremely challenging, especially since only about one-third of the 4,500 discovered characters have been successfully identified. To overcome this difficulty, the paper proposes the use of advanced artificial intelligence technology to assist in the interpretation of Oracle Bone Inscriptions. However, to fully leverage the potential of artificial intelligence in this field, a comprehensive and high-quality annotated dataset of Oracle Bone Inscriptions is required. Existing datasets often only provide simple annotation information, limiting their application value. Therefore, the paper proposes a new dataset called the "Oracle Bone Inscriptions Multi-modal Dataset" (OBIMD). This dataset contains annotation information for 10,077 pieces of Oracle Bones, each with two modalities: pixel-level aligned rubbings and tracings. The dataset provides detailed annotations including detection boxes, character categories, transcription texts, corresponding inscription groups, and the reading order within the groups, thereby offering comprehensive and high-quality data support for various AI research tasks related to Oracle Bone Inscriptions. By creating and releasing such a dataset, the researchers believe it can significantly advance the application of artificial intelligence algorithms in the study of Oracle Bone Inscriptions, promoting the development of tasks such as Oracle Bone translation, denoising, character matching, generation, reading sequence prediction, and missing character completion.

Oracle Bone Inscriptions Multi-modal Dataset

OBC306: A Large-Scale Oracle Bone Character Recognition Dataset

An open dataset for oracle bone script recognition and decipherment

Dynamic Dataset Augmentation for Deep Learning-based Oracle Bone Inscriptions Recognition

An open dataset for oracle bone character recognition and decipherment

Recognition of Oracle Bone Inscriptions by using Two Deep Learning Models

OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?

An Oracle Bone Inscriptions Detection Algorithm Based on Improved YOLOv8

Deciphering Oracle Bone Language with Diffusion Models

Automated Recognition of Oracle Bone Inscriptions Using Deep Learning and Data Augmentation

An open dataset for the evolution of oracle bone characters: EVOBC

A study on encoding-based oracle bone script recognition

A dataset of oracle characters for benchmarking machine learning algorithms

Applications of Convolutional Neural Networks to Extracting Oracle Bone Inscriptions from Three-Dimensional Models

IsOBS: an Information System for Oracle Bone Script.

Oracle Bone Script Intelligent Recognition: Automatic Segmentation and Recognition of Original Rubbing Single Characters

OracleSage: Towards Unified Visual-Linguistic Understanding of Oracle Bone Scripts through Cross-Modal Knowledge Fusion

Automatic Segmentation of Oracle Bone Inscriptions Using YOLOv8

A Cross-Font Image Retrieval Network for Recognizing Undeciphered Oracle Bone Inscriptions

Research on automatic segmentation and recognition of original topographic single characters based on intelligent recognition of oracle bones

Diff-Oracle: Deciphering Oracle Bone Scripts with Controllable Diffusion Model