Segmentation Concept Graphic


This website serves as documentation of Bryan Blanc and Michael Carraher’s Spring & Summer 2022 Innovation Incubator project. It describes and demonstrates the methodology developed to automate the segmentation of transit networks using General Transit Feed Specification (GTFS) and OpenStreetMap (OSM) data, which are both open source and widely available for cities across the globe.

Purpose & Motivation

What is segmentation?

For many projects at Nelson\Nygaard, from transit operations analysis to street design to transportation safety analysis, the development of a series of custom street segment geometries is an essential first step for geographic data analysis. Street segment geometries allow for the aggregation of related data to a level useful to clients and policy makers. For example, a system of bus routes may use some similar street segments and some different ones, and we will want to precisely understand the extent of shared street segments – this requires a systematic segmentation process.

The Segmentation Process

Prior to this Innovation Incubator work, the segmentation process could be partially automated using various methods, but still required considerable manual labor – i.e., drawing and/or refining the segments manually in GIS software. We had developed several methods that mix automation and manual editing for segmenting a bus transit system through our work on the Bus Delay Analysis Tool (BDAT) and related projects, as was referenced in a Spring 2021 Innovation Incubator project that Bryan Blanc worked on with Esther Needham. Nevertheless, this was typically a time intensive part of any project scope analyzing transit operations (or other types of data) at a street segment level. It could range from a few dozen hours for a relatively simple street network to hundreds of hours on our BDAT projects.

Adapting code from an implementation of BDAT in Austin, TX, the primary result of this project is a set of tested and documented scripts to automate the segmentation process using advanced geographic analysis methods (e.g., map matching) and a consistent street network available in all geographies – namely the OpenStreetMap street network. This Innovation Incubator grant allowed for that first set of scripts to be tested on more networks and accordingly refined, as well as for the codebase itself to be simplified and documented. The codebase can now be applied relatively easy in any geography where both GTFS and OSM data are available and reasonably accurate, which is most urban areas in the world and many other areas as well.


The deliverables from this Innovation Incubator project are this website itself and links on the resources tab, including the codebase itself, as well a demonstration of the segmentation process on several different urban transit networks.

Future Work

A few ideas for future expansions of this work are listed below:

  • Generalizing the methodology for linear geometries from sources other than GTFS. Segmenting GTFS transit networks is the most typical use of this process so far, but there is no reason that other travel path data (e.g., bicycle routes) could not be segmented similarly.

  • Additional options for automated and manual identification of breakpoints. While there was not sufficient time to develop these options within this Innovation Incubator project, there are options both to standardize incorporation of manually identified breakpoints as well as to add options for automated identification of breakpoints, e.g. at intersections with arterial or higher-class roadways. For non-transit applications, segments may be broken at every intersection - in this case, we would just want to identify breakpoints for each intersection, and the rest of the algorithm would work accordingly.

  • Resolving remaining imperfections in algorithm. Dependent upon the quality of the source GTFS data and the underlying OSM network matched to, there are still imperfections to segment geometries that can be observed, and for the time being, these will have to be manually resolved. To further improve this algorithm, we would need to be able to programatically detect and correct these imperfections. For now, the easiest way to QA/QC the results is to use the demo Shiny app included in the repository, editing the code to use your selected GTFS feed.