Abstract:In recent years, passively recorded probe traffic volumes have increasingly been used to estimate traffic volumes. However, it is not always possible to count probe traffic volume in a spatial dataset when probe trajectories cannot be fully reconstructed from raw probe point location data due to sparse recording intervals, lack of pseudonyms or timestamps. As a result, the application of such probe point location data has been limited in traffic volume estimation. To relax these constraints, we present the exact distribution of the estimated probe traffic volume in a road segment based on probe point location data without trajectory reconstruction. The distribution of the estimated probe traffic volume can exhibit multimodality, without necessarily being line-symmetric with respect to the true probe traffic volume. As more probes are present, the distribution approaches a normal distribution. The conformity of the distribution was visualised through numerical simulations. Sometimes, there exists a local optimal cordon length that maximises estimation precision. The theoretical variance of estimated probe traffic volume can address heteroscedasticity in the modelling of traffic volume estimates.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the problem of how to estimate the probe traffic volume in road segments using sparse, non - chronological probe point data without reconstructing the trajectory. Specifically, the author proposes a method to estimate the number of probes passing through a certain road segment without relying on complete trajectory information. This solves the problem in existing methods where probe data cannot be effectively used due to data sparsity or lack of anonymity.
#### Background and problem description
In recent years, passively - recorded probe traffic volume data has been increasingly used to estimate traffic volume. However, in some cases, due to sparse recording intervals, lack of pseudonyms or timestamps, it is impossible to fully reconstruct the probe trajectory, making it difficult to accurately count the probe traffic volume. This limits the application of such probe location data in traffic volume estimation. To relax these limitations, this paper proposes a probe traffic volume estimation method based on probe location data without trajectory reconstruction.
#### Main problems
1. **Data sparsity**: The probe data recording intervals are large, resulting in the inability to accurately reconstruct the trajectory.
2. **Privacy protection**: The data may lack pseudonyms and timestamps to protect user privacy.
3. **Limitations of traditional methods**: Existing traffic volume estimation methods rely on traditional devices at fixed locations (such as pneumatic tubes, coils, radars, etc.), which are limited by space, time and budget.
#### Solution
The author proposes a mathematical model to describe the distribution of probe traffic volume and proves the applicability and accuracy of this model under different conditions. This model allows the estimation of traffic volume through probe location data without complete trajectory information, and its compliance can be verified by numerical simulation. In addition, the author also explores the influence of the length of the virtual cordon on the estimation accuracy and finds that there is a locally optimal cordon length that can maximize the estimation accuracy.
#### Key formulas
1. **Unbiased estimator**:
\[
\hat{m}=\frac{t}{d}\sum_{a = 1}^{n}s_{a}
\]
where \(\hat{m}\) is the estimated value of the probe traffic volume, \(t\) is the recording interval, \(d\) is the length of the virtual cordon, and \(s_{a}\) is the speed of the \(a\)-th probe.
2. **Variance**:
\[
\text{Var}[\hat{m}]=\frac{mt^{2}}{d^{2}}\int_{0}^{\infty}b(s, d, t)g(s)\,ds
\]
where \(b(s, d, t)=s^{2}p(1 - p)\), \(p\) is the fractional part, and \(g(s)\) is the probability density function of the probe speed.
3. **Normal distribution approximation**:
\[
\lim_{m\rightarrow\infty}f(\hat{m}; m)=N\left(m,\frac{mt^{2}}{d^{2}}\int_{0}^{\infty}b(s, d, t)g(s)\,ds\right)
\]
4. **Optimal cordon length**:
\[
\text{argmin}_{0 < d\leq\max(d)}\text{obj}(d)
\]
where \(\text{obj}(d)\) can be the coefficient of variation (CV) or the variance - to - mean ratio (VMR).
#### Conclusion
The model proposed in this paper provides an effective solution for estimating traffic volume using probe location data without reconstructing the trajectory. This method not only improves data utilization but also enables traffic volume estimation while protecting privacy.