Statistical analysis of geoinformation data for increasing railway safety

Katarzyna Gawlak,Jarosław Konieczny,Krzysztof Domino,Jarosław Adam Miszczak
2024-08-01
Abstract:The impact of rail transport on the environment is one of the crucial factors for the sustainable development of this form of mass transport. We present a data-driven analysis of wild animal railway accidents in the region of southern Poland, a step to create the train driver warning system. We built our method by harnessing the Bayesian approach to the statistical analysis of information about the geolocation of the accidents. The implementation of the proposed model does not require advanced knowledge of data mining and can be applied even in less developed railway systems with small IT support. Furthermore, we have discovered unusual patterns of accidents while considering the number of trains and their speed and time at particular geographical locations of the railway network. We test the developed approach using data from southern Poland, compromising wildlife habitats and one of the most urbanised regions in Central Europe, based on this we conclude that our model is best suited to railway lines that pass through varying types of landscape.
Computational Engineering, Finance, and Science
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the safety of railway transportation through the statistical analysis of geographic information data, especially in the case of frequent collisions between wild animals and trains in southern Poland. Specifically, the objectives of the paper are: 1. **Analyze the geostatistical data of collisions between wild animals and trains**: Study the spatial distribution characteristics of these accidents to identify high - risk areas (hotspots). 2. **Create an easy - to - implement Bayesian prediction model**: This model can predict the probability of collisions between wild animals and trains at a specific time and place, thus providing early warnings for train drivers. 3. **Verify the practicality of the prediction model**: Test the effectiveness of the model using actual data in southern Poland, especially considering that this area has both rich wild animal habitats and is one of the most urbanized regions in Central Europe. 4. **Explore the mutual relationship between wild animal ecology and railway transportation safety**: Study how to reduce collisions between wild animals and trains, thereby improving the environmental - friendliness and safety of railway transportation. ### Main contributions of the paper - **Complete data - driven approach**: Starting from the collection of animal collision data, including railway traffic density, to the final prediction and early - warning mechanism. - **Avoid complex parametric models**: Use the Bayesian method for statistical analysis, avoiding complex models that require detailed parameters of animal activity patterns. - **General method applicable to different regions**: Although the analysis is based on data from southern Poland, the proposed methods and tools can be applied to similar data in other parts of the world. ### Model details The paper proposes an early - warning model based on the Bayesian method for marking high - risk areas on railway sections. The key parameters of the model include: - \( \mu(\tau) \): The expected number of accidents on a certain day in a specific month \( \tau \). - \( p(t, \Delta t|\tau) \): The conditional probability that an accident occurs within the time interval \( [t, t + \Delta t] \) on a certain day in a specific month \( \tau \). - \( p(l|\tau, t, \Delta t) \): The conditional probability that an accident occurs on the railway line \( l \) on a certain day in a specific month \( \tau \) and within the time interval \( [t, t + \Delta t] \). - \( p(x, \Delta x|\tau, t, \Delta t, l) \): The conditional probability that an accident occurs between the kilometer posts \( [x, x + \Delta x] \) on the railway line \( l \) on a certain day in a specific month \( \tau \), and within the time interval \( [t, t + \Delta t] \). The model assumes that accidents are independent, so the probability of an accident occurring between the kilometer posts \( [x, x + \Delta x] \) on the railway line \( l \) within the time interval \( [t, t + \Delta t] \) on a certain day in a specific month \( \tau \) can be expressed as: \[ p(\tau, t, \Delta t, l, x, \Delta x) = p(x, x + \Delta x|\tau, t, \Delta t, l)\cdot p(l|\tau, t, \Delta t)\cdot p(t, \Delta t|\tau)\cdot \mu(\tau) \] ### Data analysis The paper uses data provided by railway operators in southern Poland. This area has both highly urbanized areas and rich wild animal habitats. The data describes different areas passed by the railway network, including urban, rural and forest areas. Through the analysis of these data, the paper demonstrates the effectiveness and practicality of the model in practical applications.