Irmp: from Printed Forms to Relational Data Model

Jun Zhou,Han Yu,Cheng Xie,Hongming Cai,Lihong Jiang
DOI: https://doi.org/10.1109/hpcc-smartcity-dss.2016.0199
2016-01-01
Abstract:Massive printed forms are inevitably existing in daily business processes, which makes it di cult for computers to deal with. Thus, there is an emerging requirement to automatically convert these print-outs into computer understandable data, stored as structured data models for further applications. To cater to this need, we rst extract table lines and texts from printed forms and convert them into RDF models. Then the heterogeneous models extracted from di erent instances are connected based on string and lexical similarity. Finally according to the mapping rules we automatically convert the connected models into the relational data model, which builds the foundation for subsequent use such as database generation and linked data interconnection. Multiple experiments using real resumes as dataset as well as a case study are conducted to verify the framework. And we construct a prototype system, iRMP(intelligent Resource Management Platform), to demonstrate the practicability and e ectiveness of the approach.
What problem does this paper attempt to address?