Exploring jittering and routing options for converting origin-destination data into route networks: towards accurate estimates of movement at the street level
Introduction
Origin-destination (OD) datasets provide information on aggregate travel patterns between zones and geographic entities. OD datasets are ‘implicitly geographic’, containing identification codes of the geographic objects from which trips start and end. A common approach to converting OD datasets to geographic entities, for example represented using the simple features standard (Open Geospatial Consortium Inc 2011) and saved in file formats such as GeoPackage and GeoJSON, is to represent each OD record as a straight line between zone centroids. This approach to representing OD datasets on the map has been since at least the 1950s (Boyce and Williams 2015) and is still in use today (e.g. Rae 2009).
Beyond simply visualising aggregate travel patterns, centroid-based geographic desire lines are also used as the basis of many transport modelling processes. The following steps can be used to convert OD datasets into route networks, in a process that can generate nationally scalable results (Morgan and Lovelace 2020):
OD data converted into centroid-based geographic desire lines
Calculation of routes for each desire line, with start and end points at zone centroids
Aggregation of routes into route networks, with values on each segment representing the total amount of travel (‘flow’) on that part of the network, using functions such as overline() in the open source R package stplanr (Lovelace and Ellison 2018)
This approach is tried and tested. The OD -> desire line -> route -> route network processing pipeline forms the basis of the route network results in the Propensity to Cycle Tool, an open source and publicly available map-based web application for informing strategic cycle network investment, ‘visioning’ and prioritisation (Lovelace et al. 2017; Goodman et al. 2019). However, the approach has some key limitations:
Flows are concentrated on transport network segments leading to zone centroids, creating distortions in the results and preventing the simulation of the diffuse networks that are particularly important for walking and cycling
The results are highly dependent on the size and shape of geographic zones used to define OD data
The approach is inflexible, providing few options to people who want to use valuable OD datasets in different ways
To overcome these limitations we developed a ‘jittering’ approach to conversion of OD datasets to desire lines that randomly samples points within each zone (Lovelace, Félix, and Carlino Under Review). While that paper discussed the conceptual development of the approach, it omitted key details on its implementation in open source software.
In this paper we outline the implementation of jittering and demonstrate how a single Rust crate can provide the basis of implementations in other languages. Furthermore, we demonstrate how jittering can be used to create more diffuse and accurate estimates of movement at the level of segments (‘flows’) on transport network, in reproducible code-driven workflows and with minimal computational overheads compared with the computationally intensive process of route calculation (‘routing’) or processing large GPS datasets. The overall aim is to describe the jittering approach in technical terms and its implementation in open source software.
Before describing the approach, some definitions are in order:
Origins: locations of trip departure, typically stored as ID codes linking to zones
Destinations: trip destinations, also stored as ID codes linking to zones
Attributes: the number of trips made between each ‘OD pair’ and additional attributes such as route distance between each OD pair
Jittering: The combined process of ‘splitting’ OD pairs representing many trips into multiple ‘sub OD’ pairs (disaggregation) and assigning origins and destinations to multiple unique points within each zone
Approach
Jittering represents a comparatively simple — compared with ‘connector’ based methods (Jafari et al. 2015) — approach is to OD data preprocessing. For each OD pair, the jittering approach consists of the following steps for each OD pair (provided it has required inputs of a disaggregation threshold, a single number greater than one, and sub-points from which origin and destination points are located):
Checks if the number of trips (for a given ‘disaggregation key’, e.g. ‘walking’) is greater than the disaggregation threshold.
If so, the OD pair is disaggregated. This means being divided into as many pieces (‘sub-OD pairs’) as is needed, with trip counts divided by the number of sub-OD pairs, for the total to be below the disaggregation threshold.
For each sub-OD pair (or each original OD pair if no disaggregation took place) origin and destination locations are randomly sampled from sub-points which optionally have weights representing relative probability of trips starting and ending there.
This approach has been implemented efficiently in the Rust crate odjitter, the source code of which can be found at https://github.com/dabreegster/odjitter.
Results
We have found that jittering leads to more spatially diffuse representations of OD datasets than the common approach to desire lines that go from and to zone centroids. We have used the approach to add value to numerous OD datasets for projects based in Ireland, Norway, Portugal, New Zealand and beyond. Although useful for visualising the complex and spatially diffuse reality of travel patterns, we found that the most valuable use of jittering is as a pre-processing stage before routing and route network generation. Route networks generated from jittered desire lines are more diffuse, and potentially more realistic, that centroid-based desire lines.
We also found that the approach, implemented in Rust and with bindings to R and Python (in progress), is fast. Benchmarks show that the approach can ‘jitter’ desire lines representing millions of trips in a major city in less than a minute on consumer hardware.
We also found that the results of jittering depend on the geographic input datasets representing start points and trip attractors, and the use of weights. This highlights the importance of exploring the parameter space for optimal jittered desire line creation.
Next steps
We plan to create/improve R/Python interfaces to the odjitter and enable others to benefit from it.
We plan to improve the package’s documentation and to test its results, supporting reproducible sustainable transport research worldwide.