Automatic identification of noise in degraded historical documents

Abderrahmane Kefali,Ismail Bouacha,Ahmed Abderrezzaq Haddad,Chokri Ferkous
DOI: https://doi.org/10.1007/s11760-024-03725-w
IF: 1.583
2024-12-10
Signal Image and Video Processing
Abstract:The classification of degradation in historical document images plays a pivotal role in their preservation and restoration. This paper introduces a novel approach for noise classification using classical machine learning techniques, specifically Multi-Layer Perceptrons (MLPs). We assembled a comprehensive dataset of historical documents from a range of public sources, from which global and local statistical features were extracted for MLP training and validation. Through extensive experimentation, we determined the optimal MLP architecture and evaluated its performance. The model was rigorously tested through both unblind and blind testing scenarios. Unblind testing, utilizing images from the same collections as the training set, achieved a robust accuracy of 97.22%. Blind testing, performed with the distinct PHIBD dataset, demonstrated a 90% accuracy, outperforming current state-of-the-art deep learning models. These results affirm the model's robustness and its potential for practical application in historical document analysis.
engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?