Concepts: hidden-markov-models | map-matching | viterbi

The problem: a fleet vehicle reports its GPS position once a minute. Each lat/lon point has 5 to 30 meters of measurement error, and a minute of city driving covers several blocks of road network. Snap each point to the nearest road and you get a path that teleports across parallel streets, jumps onto highway shoulders, and zigzags through one-ways the wrong direction. The path doesn’t drive. Newson and Krumm at Microsoft Research published the canonical fix: model the road network and the GPS trace as a Hidden Markov Model, then run Viterbi.

The core idea

The analogy: Imagine someone reading aloud from a book in a noisy room. You can hear most of the words but mishear some. To reconstruct what they actually said, you do not just guess each word in isolation; you use the surrounding context (grammar, the topic of the book) to make implausible word sequences less likely. A noisy “tre” between “the” and “fell” is probably “tree”, not “tribe” or “trek”, because “the tree fell” is a sentence that holds together.

Map matching has the same shape. Each GPS point is a noisy observation. The “words” are road segments; only certain sequences make sense (the road network’s topology constrains transitions); your job is to find the most likely sequence of road segments given the observed GPS points.

“We use a Hidden Markov Model (HMM) to find the most likely road route represented by a time-stamped sequence of latitude/longitude pairs. The HMM elegantly accounts for measurement noise and the layout of the road network.”

The HMM has two probability families:

  1. Emission probability — given that the vehicle is on road segment R at time t, how likely is it that the GPS reading would be at the observed location? This captures GPS noise.
  2. Transition probability — given that the vehicle was on segment R_i at time t and segment R_j at time t+1, how likely is that pair? This captures network topology and travel speed.

Run Viterbi over the (segment, time) trellis and you recover the most likely road sequence.

What’s clever — find the instinct

The non-obvious move is the form of the transition probability. The temptation is to use the road network distance directly: “transition from segment A to segment B is likely if A and B are connected.” But that ignores the GPS evidence about how far the vehicle actually moved between samples.

Newson & Krumm define the transition probability as the closeness of two distances:

  • The straight-line distance between consecutive GPS points (call it d_GPS).
  • The road-network distance between the candidate segments at those times (call it d_road).

A reasonable transition has d_road close to d_GPS. The vehicle drove roughly as far on the network as the GPS says it moved through space. If d_road is much larger than d_GPS, the candidate path is taking a detour the GPS does not corroborate. If d_road is much smaller, the candidate is implausibly direct (typically because the candidate jumps across a parallel road that is not actually connected).

The probability is modeled as exponential in |d_GPS - d_road|:

p_trans = (1/beta) * exp(-|d_GPS - d_road| / beta)

where beta is a learned scale parameter (around 5 meters in their evaluation).

“We use the absolute difference between the route distance and the great circle distance to compute the transition probability.”

The emission probability is simpler — Gaussian in great-circle distance from GPS reading to nearest point on the candidate segment:

p_emit = (1 / (sigma * sqrt(2*pi))) * exp(-0.5 * (d / sigma)^2)

where d is the great-circle distance from the observed GPS point to the candidate segment, and sigma is the GPS noise standard deviation (around 4.07m on the data they tested).

The third clever move: rather than search over the full road graph at every step, only consider candidate road segments within a fixed radius of each GPS observation. This keeps the trellis sparse.

Walkthrough: matching three GPS points

Setup: vehicle traveling east along Main St for 30 seconds.
       GPS samples every 10 seconds.

Observation z_0 at (47.65000, -122.13500)
   Candidate roads within 50 m:
     R_a: Main St segment (great circle distance to z_0: 3 m)
     R_b: 1st Ave segment, parallel to Main (distance: 22 m)

Observation z_1 at (47.65010, -122.13400)
   Candidates:
     R_a: Main St (continues east) — distance 4 m
     R_b: 1st Ave — distance 24 m

Observation z_2 at (47.65020, -122.13300)
   Candidates:
     R_a: Main St (continues east) — distance 2 m
     R_b: 1st Ave — distance 21 m

Step 1: Compute emission probs (sigma = 4.07 m).
   For z_0: p_emit(R_a) = N(3; 0, 4.07) ≈ 0.077
            p_emit(R_b) = N(22; 0, 4.07) ≈ 1e-7

Step 2: Compute transition probs (beta = 5 m).
   z_0 to z_1 great-circle distance: ~75 m east
   Road distance R_a@z_0 -> R_a@z_1: 75 m  (on Main, eastbound)
   Road distance R_a@z_0 -> R_b@z_1: 175 m (drive to 1st Ave intersection then back)
   Road distance R_b@z_0 -> R_b@z_1: 75 m  (on 1st Ave, eastbound)
   Road distance R_b@z_0 -> R_a@z_1: 175 m (cross over)

   p_trans(R_a -> R_a) = exp(-|75-75|/5)/5 = 0.20 (highest)
   p_trans(R_a -> R_b) = exp(-|75-175|/5)/5 ≈ 0
   p_trans(R_b -> R_b) = exp(-|75-75|/5)/5 = 0.20

Step 3: Viterbi propagates the best path
   The best path to R_a@z_1 = best path to R_a@z_0 * trans(R_a->R_a) * emit(R_a@z_1)
                            ≫ best path going through R_b at any step

   Result: most likely sequence is R_a -> R_a -> R_a (Main St all the way).

The path is consistent: vehicle stayed on Main St. If z_1 had been a noisy outlier near 1st Ave, the emission term for R_b would have spiked, but the transition cost of jumping over from Main to 1st and back would tank the joint probability. The HMM naturally rejects single-point outliers.

Does it work? What breaks?

The paper evaluates on 6,545 GPS points from a 2-hour drive in Seattle, hand-labeled with the ground-truth road sequence.

Sampling rateAccuracy (correctly matched fraction of route)
1 Hz (their default)~99%
1 / 30 s~95%
1 / 60 s~85%
1 / 120 s~70%

The graceful degradation is the headline. As samples get sparser, ambiguity grows; the HMM can no longer rule out parallel roads because the per-sample emission cost gets amortized over more meters of road.

The paper also tests added noise: at sigma = 50 m (cell-tower-grade error), accuracy is still 80% at 1 Hz. The HMM extracts surprisingly much signal from very noisy data because the network topology constraint is so strong.

“Our test shows how the algorithm breaks down as the sampling rate of the GPS is reduced… we test the effect of increasing amounts of additional measurement noise in order to assess how well our algorithm could deal with the inaccuracies of other location measurement systems.”

What breaks:

  • Sparse sampling combined with parallel roads: at 2-minute intervals on a freeway with frontage roads, the algorithm flips between the candidate paths.
  • Multimodal candidates without good priors: at intersections where multiple turn options are equally consistent with the GPS evidence, Viterbi picks one arbitrarily.
  • The emission model assumes Gaussian GPS error. Real GPS error is heavier-tailed (multipath in urban canyons can produce 100m+ outliers). Robust variants (Cauchy emission, M-estimator priors) handle this better but the original paper does not.
  • The Markov assumption: only the previous segment matters. In reality, drivers tend to continue on the road they are on. Higher-order extensions (semi-Markov, learned transition models) improve realism.

So what?

For an engineer working on snap-to-road, fleet trajectory analysis, or any GPS-meets-graph problem:

  1. HMM + Viterbi is the right default. This is the algorithm Mapbox, Valhalla, and OSRM all implement under the hood. Any rolled-from-scratch nearest-segment approach you build will be worse than this for the same compute budget.
  2. The two parameters (sigma, beta) are tunable to your sensor and your network. Newson & Krumm tuned to consumer GPS in Seattle; if your sensor is a phone (sigma 8-15m), or a $5 IoT GPS (sigma 25m+), retune from telemetry.
  3. For speed, prune the candidate set per observation. Keep only road segments within K meters of each GPS point and drop transitions whose distance ratio is more than ~3x off; this cuts the trellis to manageable size without hurting accuracy.
  4. For sparse data, augment with movement priors. When the sampling interval is large, add a heading-consistency term (vehicle heading should align with road bearing) and a speed-consistency term (vehicle speed implied by GPS-derived velocity should be feasible for the candidate road class).
  5. Treat the algorithm as a building block, not the final answer. Map matching’s output is the most-likely path; downstream stages (turn detection, dwell-time estimation, distance traveled) all depend on it. Errors compound; design for graceful degradation.

For Saikat’s snap-to-road work specifically: this is the foundation paper. Modern variants (Online HMM, FMM with sliding window, learned emission/transition functions) all build on this two-parameter HMM. The geospatial test data the authors released (Seattle GPS trace + road network with ground truth path) is still the standard regression test for any new map-matching method.

Connections

Citation

Newson, P., & Krumm, J. (2009). Hidden Markov Map Matching Through Noise and Sparseness. 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2009), Seattle, WA. https://www.microsoft.com/en-us/research/publication/hidden-markov-map-matching-noise-sparseness/