Task design and assignment of full-text generation on mass Chinese historical archives in digital humanities
Jihong Liang,Hao Wang,Xiaojing Li
DOI: https://doi.org/10.1108/ajim-09-2019-0245
IF: 1.935
2020-03-25
Aslib Journal of Information Management
Abstract:Purpose The purpose of this paper is to explore the task design and assignment of full-text generation on mass Chinese historical archives (CHAs) by crowdsourcing, with special attention paid to how to best divide full-text generation tasks into smaller ones assigned to crowdsourced volunteers and to improve the digitization of mass CHAs and the data-oriented processing of the digital humanities. Design/methodology/approach This paper starts from the complexities of character recognition of mass CHAs, takes Sheng Xuanhuai archives crowdsourcing project of Shanghai Library as a case study, and makes use of the theories of archival science, including diplomatics of Chinese archival documents, and the historical approach of Chinese archival traditions as the theoretical basis and analysis methods. The results are generated through the comprehensive research. Findings This paper points out that volunteer tasks of full-text generation include transcription, punctuation, proofreading, metadata description, segmentation, and attribute annotation in digital humanities and provides a metadata element set for volunteers to use in creating or revising metadata descriptions and also provides an attribute tag set. The two sets can be used across the humanities to construct overall observations about texts and the archives of which they are a part. Along these lines, this paper presents significant insights for application in outlining the principles, methods, activities, and procedures of crowdsourced full-text generation for mass CHAs. Originality/value This study is the first to explore and identify the effective design and allocation of tasks for crowdsourced volunteers completing full-text generation on CHAs in digital humanities.
information science & library science,computer science, information systems