Intelligently matches a series of recorded GPS coordinates to a known network of trails.
GPS measurements suffer from location uncertainty. The US Government performance standard for GPS indicated an average horizontal accuracy of 4 meters as of 2008. It is an amazing technological feat that a person holding a GPS-enabled phone or watch can be located so precisely by a system of satellites orbiting 20 million meters overhead.
Unfortunately, certain applications demand a higher level of accuracy. Trail runners are keenly interested in the slope of the trails where their workouts take place. Pace and slope are by far the two most important factors that determine workout intensity. A 10-minute mile may not sound particularly intense, until considering that the runner performed this feat on an average 20 percent grade. At each point along a trail runner's GPS trace, accurate elevation data is key to understanding the slope of the trail, and by proxy, the runner's intensity.
GPS fitness trackers do not directly collect elevation data. Instead, GPS coordinates are typically collected every second and input into a digital elevation model (DEM), which represents a simplified view of the earth surface elevation above sea level. Even with a perfect DEM, inaccurate GPS locations can generate a false elevation profile for a workout.
This is less of a problem where the Earth's topography varies slowly and predictably. Think of gently rolling bluffs in Kansas; a few meter's horizontal difference would barely affect the elevation coordinate. But now, imagine a trail runner traversing a steep hillside - running across it, not up or down. In reality, this runner is likely traveling on a trail with a relatively predictable and shallow slope. But the GPS device doesn't know about the trail, and it may indicate that the runner is rapidly traveling uphill or downhill as inaccurate location measurements land on either side of the actual trail. Any attempt to infer the intensity of the runner's workout must correct for this obvious misrepresentation of the runner's work.
If we are only interested in long averages of intensity, on the order of minutes, the average slope would not suffer too much from GPS inaccuracy; the distance used to calculate average slope would be much larger than the error. But what about 5-second intensity? This is where high-quality location data is vital. Until exercise scientists develop the instrumentation to directly measure runners' instantaneous power output, we must use a proxy - speed and slope.
So how do we improve the elevation data we are using? By using known trail locations. Let's assume we have a perfect DEM, one which completely realistically represents the Earth's surface. And we have GPS data collected on a trail run at one-second intervals. This project implements an algorithm to constrain an inaccurate GPS trace to a known trail network. The methodology is described below.
This project implements a map matching algorithm using a Hidden Markov Model (HMM) described in a 2009 paper by Paul Newson and John Krumm. The linked paper describes the algorithm in detail.
In a nutshell, the algorithm takes a GPS trace and finds the most likely corresponding path through a trail network. A path is considered "more likely" if each GPS point is close to the corresponding matched point, and if the point-to-point distance along the matched route is similar to the distance along the GPS route. In other words, the algorithm uses the GPS data as a starting point and attempts to produce a similar route on the trail grid. Since GPS errors are not too large, this often means that a given GPS point is matched to the closest point on the trail grid. But often, trails are winding and may even double back on themselves, meaning a simple closest-point map matching algorithm could jump forward or backward on the route by an unrealistic amount from one second to the next. The algorithm resolves this problem by getting a little smart and considering how far the user moved between GPS points.
import networkx as nx
from geopy.distance import distance.great_circle as distance
from mapmatching import mapmatch
# Build the trail network graph using networkx
latlons_trailnet = [[-105.000, 40.000], [-105.000, 40.100], [-105.200, 40.100]]
graph_trailnet = nx.Graph()
for i in range(0, len(latlons_trailnet)):
graph.add_node(
i,
pos=(latlons_trailnet[i][1], latlons_trailnet[i][0])
)
for i in range(0, len(latlons_trailnet)-1):
graph.add_edge(
i,
i+1,
key=i,
length=distance(
latlons_trailnet[i][::-1],
latlons_trailnet[i+1][::-1]).meters
)
# Build gps lists/arrays (typically these would be loaded from a file)
latlons_gps = [[-105.000, 40.00000], [-105.000, 40.00003],
[-105.000, 40.00006], [-105.000, 40.00009],
[-105.000, 40.00012], [-105.000, 40.00015],
[-105.000, 40.00018], [-105.000, 40.00021],
[-105.000, 40.00024], [-105.000, 40.00024]]
times_gps = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# Feed the input into the map matching algorithm
points_matched, emission_points_matched = mapmatch(
latlons_gps,
times_gps,
graph_trailnet)
- Create Python package. Make pretty input-output later.
-
Implement a series of tests to ensure functionality as development progresses.
-
Streamline input so user can be more hands-off.
- Benchmark algorithm performance (speed and accuracy). Generate fake user GPS coordinates by sampling along a route and simulating GPS drift. Document algorithm behavior under various conditions by varying:
- simulated GPS sampling rate
- magnitude of simulated GPS error (standard deviation)
- trail network segment lengths relative to GPS segment lengths
- algorithm parameters: SIGMA_Z, BETA, DIST_ERR_TOL, etc
-
Enhance the speed of the Viterbi algorithm by minimizing the computational load as much as possible without degrading performance. This requires the results of the performance tests described above.
-
Compare algorithm performance to publicly available map matching services.
- Mapbox
- Others?
-
Implement a version of Dijkstra's algorithm, described in the documentation for another map matching project by Valhalla, as an alternate to the Viterbi algorithm which is currently in use.
-
Compare the speed and accuracy of the two algorithm options.
-
Integrate OpenStreetMap to provide trail network data.
-
Integrate with my website's route database.
-
Create a page on my website that allows users to match their own gps coordinates.
-
A Developer Diary for helping understand the nuts and bolts of the Viterbi algorithm in Python.
-
Mapzen, original creators of Valhalla, for ideas for benchmarking and future work.
-
Graphhopper for description of algorithm implementation.
Reach out to me at one of the following places!
- Website: trailzealot.com
- LinkedIn: linkedin.com/in/aarondschroeder
- Twitter: @trailzealot
- Instagram: @trailzealot
- GitHub: github.com/aaron-schroeder
- MIT license
- Copyright 2019 © Aaron Schroeder.