Abstract:Classifying and counting vehicles in road traffic has numerous applications in the transportation engineering domain. However, the wide variety of vehicles (two-wheelers, three-wheelers, cars, buses, trucks etc.) plying on roads of developing regions without any lane discipline, makes vehicle classification and counting a hard problem to automate. In this paper, we use state of the art Convolutional Neural Network (CNN) based object detection models and train them for multiple vehicle classes using data from Delhi roads. We get upto 75% MAP on an 80-20 train-test split using 5562 video frames from four different locations. As robust network connectivity is scarce in developing regions for continuous video transmissions from the road to cloud servers, we also evaluate the latency, energy and hardware cost of embedded implementations of our CNN model based inferences.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to automatically classify and count vehicles on roads without lane markings. Specifically, researchers are concerned with how to use computer vision technology to automatically classify and count multiple types of vehicles (including two - wheelers, three - wheelers, cars, buses, and trucks, etc.) in developing - country cities with complex traffic conditions and no clear lane markings (such as Delhi in India and its surrounding areas).
### Main Problems and Challenges
1. **Diversity**: There are a large number of different types of vehicles in the road traffic of developing countries, and these vehicles are very different in appearance from those in developed countries (for example, three - wheeled motorcycles, electric tricycles, rickshaws, etc.).
2. **Occlusion**: Due to the lack of lane discipline, large vehicles are often occluded by small vehicles, increasing the detection difficulty.
3. **Road Structure**: The road intersection designs in developing countries are irregular, different from the rectangular grid - like road structures in developed countries, resulting in different viewing angles and traffic flow patterns.
4. **Dataset Difference**: Most of the existing labeled datasets are from developed countries, and the vehicle types and traffic scenes in these datasets do not match the situations in developing countries. Direct application will lead to poor model performance.
### Solutions
To address the above challenges, the authors took the following measures:
- **Create a Local Dataset**: Collected and labeled video frames from Delhi and its surrounding areas, and constructed a dataset containing 5,562 image frames, with a total of 32,088 labeled boxes.
- **Train a CNN Model**: Used the YOLO (You Only Look Once) convolutional neural network model and fine - tuned it on the self - built dataset. The experimental results show that under an 80 - 20 training - test split, the model reached a maximum MAP (Mean Average Precision) value of 75%.
- **Embedded Platform Evaluation**: Considering the problem of unstable broadband network connections, the researchers also evaluated the inference performance of the model on three embedded platforms (Nvidia Jetson TX2, Raspberry PI Model 3B, and Intel Movidius Neural Compute Stick), and analyzed the trade - offs between latency, energy consumption, and hardware cost.
### Application Value
This research has a wide range of application prospects, mainly including:
- **Infrastructure Planning**: Calculate the number of different types of vehicles to evaluate road capacity and help plan new overpasses, underpasses, pedestrian overpasses, and other facilities.
- **Policy Evaluation**: Monitor the impact of specific policies (such as "odd - even license plate restrictions") on road traffic and provide data support to optimize policies.
- **Real - time Traffic Management**: Detect speeding or illegally - driving heavy vehicles and impose penalties in a timely manner.
- **Public Transport Monitoring**: Track the arrival time of buses and improve the predictability of public transport.
In summary, this paper aims to solve key problems in urban traffic management in developing countries through deep learning and computer vision technology, and provide technical support for future intelligent transportation systems.