PDFChatAnnotator: A Human-LLM Collaborative Multi-Modal Data Annotation Tool for PDF-Format Catalogs

Chia-Ming Chang,Xi Yang,Yi Tang
DOI: https://doi.org/10.1145/3640543.3645174
2024-03-18
Abstract:The document contains substantial unannotated data, necessitating extensive manual labeling efforts. To address this issue, we introduce PDFChatAnnotator, a human-LLM collaborative tool to collect multi-modal data from PDF catalogs. Initially, PDFChatAnnotator automatically employs our proposed multi-modal binding rules to link related data from different modalities and harnesses the information extraction capabilities of large language models (LLMs) to extract specific information from text descriptions. Furthermore, the tool empowers users to guide and refine the LLM’s annotations. During the annotation process, users can influence the LLM through multiple rounds of communication and example establishment via the provided interfaces. To assess the effectiveness of PDFChatAnnotator’s techniques, we conducted a technical evaluation using three catalogs with typical layouts as experimental data. The results showed that all accuracy rates for multi-modal binding exceeded 90%, and both the proposed "example establishment" and "interactive adjustment of requirements" contributed to enhanced accuracy rates.
Computer Science
What problem does this paper attempt to address?