Probabilistic road classification in historical maps using synthetic data and deep learning

Dominik J. Mühlematter,Sebastian Schweizer,Chenjing Jiao,Xue Xia,Magnus Heitzler,Lorenz Hurni
2024-10-03
Abstract:Historical maps are invaluable for analyzing long-term changes in transportation and spatial development, offering a rich source of data for evolutionary studies. However, digitizing and classifying road networks from these maps is often expensive and time-consuming, limiting their widespread use. Recent advancements in deep learning have made automatic road extraction from historical maps feasible, yet these methods typically require large amounts of labeled training data. To address this challenge, we introduce a novel framework that integrates deep learning with geoinformation, computer-based painting, and image processing methodologies. This framework enables the extraction and classification of roads from historical maps using only road geometries without needing road class labels for training. The process begins with training of a binary segmentation model to extract road geometries, followed by morphological operations, skeletonization, vectorization, and filtering algorithms. Synthetic training data is then generated by a painting function that artificially re-paints road segments using predefined symbology for road classes. Using this synthetic data, a deep ensemble is trained to generate pixel-wise probabilities for road classes to mitigate distribution shift. These predictions are then discretized along the extracted road geometries. Subsequently, further processing is employed to classify entire roads, enabling the identification of potential changes in road classes and resulting in a labeled road class dataset. Our method achieved completeness and correctness scores of over 94% and 92%, respectively, for road class 2, the most prevalent class in the two Siegfried Map sheets from Switzerland used for testing. This research offers a powerful tool for urban planning and transportation decision-making by efficiently extracting and classifying roads from historical maps.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the challenges faced in extracting and classifying road networks from historical maps. Specifically: 1. **High cost and time - consuming**: - Digitizing and classifying road networks in historical maps usually require a large amount of manual annotation work, which is not only costly but also time - consuming. Therefore, it limits the wide application of these maps. 2. **Lack of large - scale labeled data**: - Current methods usually require a large amount of labeled training data to achieve automatic road extraction, but the labeled data of historical maps is very limited and difficult to obtain. 3. **Distribution shift problem**: - When using synthetic data for training, the distribution difference between synthetic data and real data may lead to poor performance of the model in practical applications. Therefore, how to effectively deal with this distribution shift is an important problem. ### Solutions To address the above challenges, the author proposes a new framework that combines deep learning, geoinformatics, computer graphics and image processing methods to achieve the function of extracting and classifying roads from historical maps. The specific steps are as follows: 1. **Cascade training of binary segmentation models**: - First, use the cascade training method to train a binary segmentation model to extract road geometries from historical maps. This process includes pre - training the model on a larger dataset and then fine - tuning it on historical map data. 2. **Morphological operations, skeletonization, vectorization and filtering**: - After extracting the road geometries, perform morphological operations, skeletonization, vectorization and filtering to optimize the road extraction results. 3. **Synthetic training data generation**: - Through the computer graphics function, manually redraw road segments and use predefined road category symbols to generate synthetic training data. 4. **Deep ensemble model training**: - Use synthetic data to train a deep ensemble model to generate class probabilities for each pixel to alleviate the distribution shift problem. 5. **Road classification**: - Discretize the predicted probabilities along the extracted road geometries and further process them to classify the entire road, identify potential road category changes, and finally generate a labeled road category dataset. ### Experimental results This method was tested on the Siegfried map in Switzerland. For the most common road category 2, it achieved more than 94% completeness and 92% accuracy. The research results show that this method can efficiently extract and classify roads from historical maps and has broad application prospects, especially in urban planning and traffic decision - making.