The Problem
Vehicles, pedestrians, and IoT devices report their position via GPS or related signals. Each reading carries 5-30 meters of measurement error in open sky, much more in urban canyons. Snap each reading to the nearest road and you get a path that teleports across parallel streets, jumps onto highway shoulders, drives the wrong way down one-way streets, and oscillates between parallel roads. The path is not driveable.
Map matching is the problem: given a noisy time-series of positions and a road network graph, recover the most likely sequence of road segments the vehicle actually traveled.
The Key Insight
The road network’s topology is itself a strong constraint. A vehicle cannot teleport between disconnected segments; it cannot go faster than the road’s speed limit allows; certain transitions (turning the wrong way down a one-way) are forbidden. By modeling these constraints jointly with GPS noise, the combined evidence often resolves the path uniquely even when individual GPS points are too noisy to identify a single road.
The canonical formulation is a Hidden Markov Model: hidden states are road segments; observations are GPS points; emission probability is “how likely is this GPS reading given the vehicle is on this segment”; transition probability is “how likely is the vehicle to move from segment A to segment B between samples.” Run Viterbi to find the most likely segment sequence.
Mechanism in Plain English
- For each GPS observation, find candidate road segments within a search radius (typically 50-200 meters).
- Compute emission probability for each (observation, segment) pair: a Gaussian in the perpendicular distance from the observation to the segment.
- Compute transition probability for each (segment_at_t, segment_at_t+1) pair: an exponential in the absolute difference between the GPS-measured travel distance and the road-network shortest-path distance between the candidates.
- Run Viterbi over the trellis: for each time t and candidate segment, track the best path ending at that segment.
- Backtrack from the highest-scoring final state to get the most likely segment sequence.
What’s Clever
The key trick is the transition probability shape: |d_GPS - d_road|. Naive approaches use just the road-network distance, but that ignores how far the vehicle actually moved per the GPS. The HMM formulation explicitly checks that the network distance is consistent with the spatial movement, which automatically rejects implausible jumps and detours.
The second clever move is restricting the candidate set per observation. Without this pruning, the trellis size is O(T * |E|) where |E| is the road network edge count (typically millions). With pruning to nearby segments, the trellis is O(T * K) for small K (typically 5-20).
Code
A 15-line PyTorch-free Viterbi for map matching:
def viterbi_map_match(observations, road_network, sigma=4.07, beta=5.0):
# observations: list of (lat, lon)
T = len(observations)
candidates = [road_network.nearby_segments(obs, radius=50) for obs in observations]
# delta[t][seg] = log-prob of best path ending at seg at time t
delta = [{} for _ in range(T)]
backpointer = [{} for _ in range(T)]
for seg in candidates[0]:
d = great_circle(observations[0], seg.nearest_point(observations[0]))
delta[0][seg] = log_normal_pdf(d, sigma)
for t in range(1, T):
for seg_t in candidates[t]:
best_prev, best_score = None, float('-inf')
for seg_prev in candidates[t-1]:
d_gps = great_circle(observations[t-1], observations[t])
d_road = road_network.shortest_path(seg_prev, seg_t)
trans = -abs(d_gps - d_road) / beta - log(beta)
emit = log_normal_pdf(point_to_seg(observations[t], seg_t), sigma)
score = delta[t-1][seg_prev] + trans + emit
if score > best_score:
best_score, best_prev = score, seg_prev
delta[t][seg_t] = best_score
backpointer[t][seg_t] = best_prev
# Backtrack
best_final = max(delta[T-1], key=delta[T-1].get)
path = [best_final]
for t in range(T-1, 0, -1):
path.append(backpointer[t][path[-1]])
return path[::-1]Key Sources
- hidden-markov-map-matching-noise-sparseness — Newson & Krumm (2009), the canonical formulation
Related Concepts
- hidden-markov-models — the underlying probabilistic framework
- viterbi — the decoding algorithm
Open Questions
- How well do learned emission and transition models (e.g., neural networks trained on dense fleet data) outperform the original Gaussian / exponential parametric forms?
- For sparse sampling (>2 minute intervals), what auxiliary signals (vehicle heading, road class priors, dwell-time priors) close the gap?
- Real-time / online map matching: tradeoff between latency (act on the most recent point) and accuracy (incorporate future points to disambiguate).