Data (Information)

Synthesized GPS trajectories in downtown Manhattan road network in New York City

Mohammadi, Sevin; Smyth, Andrew W.

Four different combinations of noise level and sampling interval are considered in the data generation. Each combination consists of either low noise (25 meters) or high noise (50 meters) with short intervals (20 seconds) or long intervals (40 seconds). For each of the four combinations, 10,000 GPS trajectories are generated for the downtown Manhattan road network in New York City. Of these, 80% are initially randomly selected to train the transformer model, and 20% are used to test and compare the performances of all three models. In this dataset, the trajectories with lengths less than four and greater than 200 are discarded, excluding 16% of the training and testing data in the long-interval dataset and around 4% in the short-interval dataset. Each row in the dataset represents a trajectory consisting of multiple sequences. Each row includes the following components: sequence of road segments (this is a list of road segments that the trajectory follows), sequence of X coordinates (longitude) of noisy GPS points, sequence of Y coordinates (latitude) of noisy GPS points, sequence of X coordinates of true GPS points, and sequence of Y coordinates of true GPS Points.

Files

More About This Work

Published Here
August 6, 2024

Notes

These data are part of the paper titled 'Surrogate Modeling of Trajectory Map-Matching in Urban Road Networks Using Transformer Sequence-to-Sequence Model' by Sevin Mohammadi and Andrew W. Smyth. The data is synthesized and utilized for the comparison study component of the paper, with details explained in the 'Comparative Evaluation with State-of-the-Art Methods' section.

Degree Program: PhD
Academic Advisor: Andrew W. Smyth
Thesis or Dissertation: Not a thesis or dissertation.
Degree Earned:
Embargo Year(s):
Previously Published: false
Article Version:
Keywords: