Classification of Forms with Similar Layouts Based on Mixed Gaussian Weighted Mask

Simeng Wang,Liangcai Gao,Yuehan Wang
DOI: https://doi.org/10.1109/icdar.2015.7333736
2015-01-01
Abstract:As an essential step of form processing, form classification has attracted much attention from researchers. However, for the forms with similar layout, most of the previous classification methods still suffer from two issues: huge variation among areas of user-filled-in data and insufficient discriminative identifiers in areas of preprinted data. In this paper, we propose a novel Mixed Gaussian Weighted Mask (MGWM) based method to identify forms with similar layouts by leveraging the multiple information extracted from areas of user-filled-in data, areas of preprinted data and dithering data of a form. The proposed method utilizes a combination of three Gaussian weighted masks to mitigate the impact of noise from areas of user-filled-in data, layout consistency and position dithering among form images respectively. Experimental results show that the proposed method achieves more than 85% classification accuracy on a number of forms and outperforms the state-of-the-art form classification method.
What problem does this paper attempt to address?