Mohamed Abuella,M. Amine Atoui,Sławomir Nowaczyk,Simon Johansson,Ethan Faghani
Abstract:This paper addresses the challenge of identifying the paths for vessels with operating routes of repetitive paths, partially repetitive paths, and new paths. We propose a spatial clustering approach for labeling the vessel paths by using only position information. We develop a path clustering framework employing two methods: a distance-based path modeling and a likelihood estimation method. The former enhances the accuracy of path clustering through the integration of unsupervised machine learning techniques, while the latter focuses on likelihood-based path modeling and introduces segmentation for a more detailed analysis. The result findings highlight the superior performance and efficiency of the developed approach, as both methods for clustering vessel paths into five clusters achieve a perfect F1-score. The approach aims to offer valuable insights for route planning, ultimately contributing to improving safety and efficiency in maritime transportation.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of identifying ship paths in operational scenarios with repeated paths, partially repeated paths, and new paths. Specifically, the authors propose a spatial clustering method using only location information to label ship paths. By developing a path - clustering framework that combines distance - based path modeling and likelihood estimation methods, it aims to improve the accuracy of path clustering and provide valuable insights for route planning, ultimately improving the safety and efficiency of maritime transportation.
### Background and Motivation of the Paper
1. **Importance of Maritime Transportation**
- Maritime transportation is crucial to global trade, generating a large amount of ship trajectory data, which reveals complex spatio - temporal navigation patterns. Understanding these patterns is very important for effective maritime traffic monitoring and management.
2. **Difference between Path and Trajectory**
- A path refers to the specific travel route of an object, while a trajectory refers to a series of consecutive geographical points, each representing the position at a specific time. Therefore, a trajectory usually represents the movement of an object over time.
- A shipping route is a path or trajectory from the same origin to the same destination. If the origin or destination is different, it is regarded as a different shipping route.
3. **Applications of Path Clustering**
- Path clustering is a grouping technique based on path similarity, widely used in navigation, traffic analysis, route planning, etc. In the field of navigation, identifying paths from Automatic Identification System (AIS) data is a challenging task, especially in coastal areas. Due to high - frequency navigation operations, it is necessary to develop path - identification tools that can be integrated with route - planning systems to improve maritime safety and optimize ship routes.
### Research Objectives and Contributions
1. **Clustering Method Using Only Location Information**
- The proposed clustering method only requires location information (longitude and latitude).
2. **Ability to Handle Unseen or Unknown Paths**
- This method shows significant value in clustering unseen or unknown paths.
3. **Robustness and Interpretability**
- By applying similarity measures to reduce the influence of noise or outliers and provide a clear explanation of path clustering.
4. **Parameter Customization**
- Users can customize parameters to determine the number of path clusters, thereby enhancing the flexibility and adaptability of the framework.
5. **Path - Segment Analysis**
- It includes a method for studying and analyzing specific - paragraph patterns of paths.
6. **Decision Support**
- It is a data - driven solution that can be used for making informed decisions in route planning and optimization, traffic management, and resource allocation.
### Methodology
1. **Problem Definition**
- Each voyage is represented as a time series, each time series contains multiple data points, and each data point is defined by longitude - latitude coordinates. The path - clustering set is a set containing multiple clusters, and each cluster represents a group of similar paths.
2. **Distance - Based Method**
- Use the average nearest - neighbor distance (ANND) to measure the similarity between two paths. ANND is calculated by calculating the average of the distances between each point in one path and its nearest - neighbor point in the other path. Then, use machine - learning techniques (such as k - means, Gaussian mixture model, and hierarchical clustering) to cluster paths according to the values in the distance matrix.
3. **Piecewise Gaussian Likelihood Method**
- Divide the shipping route into different paragraphs, train the Gaussian mixture model (GMM) to find the Gaussian distribution of each paragraph. Use the trained GMM model to estimate the likelihood of each voyage - corresponding paragraph in the test data set, and then label the path clusters according to these likelihoods.
### Experimental Results
1. **Evaluation Metrics**
- Use evaluation metrics such as confusion matrix, precision, recall, and F1 - score to evaluate the results of path clustering.
2. **Experimental Results**
- Experiments were carried out on the data of two ships, Cinderella II and Buro. The results show that both the distance - based method and the piecewise Gaussian likelihood method achieved high F1 - scores when clustering paths, especially the piecewise Gaussian likelihood method, which achieved a perfect F1 - score in all path clusterings.
### Conclusions and Discussions
1. **Conclusion**
- The proposed spatial clustering method performs well in identifying ship paths, especially when dealing with complex and variable paths.