Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction

Pengjie Wang,Kaile Zhang,Xinyu Wang,Shengwei Han,Yongge Liu,Lianwen Jin,Xiang Bai,Yuliang Liu
2024-06-05
Abstract:Oracle Bone Inscriptions is one of the oldest existing forms of writing in the world. However, due to the great antiquity of the era, a large number of Oracle Bone Inscriptions (OBI) remain undeciphered, making it one of the global challenges in the field of paleography today. This paper introduces a novel approach, namely Puzzle Pieces Picker (P$^3$), to decipher these enigmatic characters through radical reconstruction. We deconstruct OBI into foundational strokes and radicals, then employ a Transformer model to reconstruct them into their modern (conterpart)\textcolor{blue}{counterparts}, offering a groundbreaking solution to ancient script analysis. To further this endeavor, a new Ancient Chinese Character Puzzles (ACCP) dataset was developed, comprising an extensive collection of character images from seven key historical stages, annotated with detailed radical sequences. The experiments have showcased considerable promising insights, underscoring the potential and effectiveness of our approach in deciphering the intricacies of ancient Chinese scripts. Through this novel dataset and methodology, we aim to bridge the gap between traditional philology and modern document analysis techniques, offering new insights into the rich history of Chinese linguistic heritage.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper focuses on how to solve the problem of interpreting ancient Chinese characters, especially Oracle Bone Inscriptions, one of the oldest forms of writing in the world. Due to its ancient age, a large number of Oracle Bone Inscriptions remain undeciphered, which poses a major challenge in the field of paleography. The paper proposes a new method called "Puzzle Pieces Picker" (P3) to decode these mysterious characters through radical reconstruction. P3 first breaks down the Oracle Bone Inscriptions into basic strokes and radicals, and then uses a Transformer model to reconstruct them into modern Chinese characters, providing an innovative solution for the analysis of ancient writing. To support this method, the paper creates a new dataset called Ancient Chinese Character Puzzles (ACCP), which includes a large number of character images from seven key historical periods, as well as detailed annotations of radical sequences. The experimental results show that this method has potential and effectiveness in dealing with the complexity of interpreting ancient Chinese characters. The paper also discusses the limitations of existing methods, such as OCR and machine learning models that rely on known labels, as well as the high cost and small scale of manually annotated datasets. The P3 method treats the interpretation of ancient Chinese characters as a puzzle game by borrowing the concept of jigsaw puzzles. It breaks down characters from different periods into radical components and then predicts their evolutionary patterns through modeling to solve the problem of deciphering unknown characters. Through this new dataset and method, the researchers aim to bridge the gap between traditional philology and modern document analysis techniques, providing a new perspective for a deeper understanding of the historical richness of China's language heritage. Keywords include: historical Chinese characters, Oracle Bone Inscriptions, optical character recognition, and radical recognition.