How to Calculate 2.7 Million Transit Routes for Free

A complete tutorial for calculating multimodal transit travel times at scale using r5py, GTFS, and OpenStreetMap—no expensive APIs required.

If we want to measure how long it actually takes transit-dependent residents to reach grocery stores, we need to calculate travel times at scale. For 408 census tracts and 6,613 grocery stores, that means 2,697,704 origin-destination pairs. Using Google Maps API, this would cost approximately $13,500 at current rates ($5 per 1,000 routes).[1] We can calculate all of them for $0.

Why does this matter? Many food access questions require transit time calculations, but commercial API pricing puts large-scale analysis out of reach for unfunded projects. Studying transit-based food access at scale requires free alternatives. They exist, but they require assembling multiple data sources and learning new tools.

This post walks through what we learned building this workflow: how to acquire GTFS transit data, build a pedestrian network from OpenStreetMap, and route with r5py. All code is available upone request.[2]


The Tool Stack

Four free, open-source components make this possible:

The Tool Stack
ComponentToolPurpose
Transit schedulesGTFS feedsBus and rail routes, stops, schedules
Street networkOpenStreetMapWalking paths between transit and destinations
Routing enginer5py (Conveyal R5)Calculate multimodal travel times
Destination dataGoogle Places APIGrocery store locations (some API cost here)

GTFS (General Transit Feed Specification) is the standard format for transit data. Most US transit agencies publish free GTFS feeds that include routes, stops, and schedules.[3]

OpenStreetMap provides free street network data worldwide. For transit routing, we need pedestrian paths: sidewalks, crosswalks, and building entrances.

r5py is a Python wrapper for Conveyal's R5 routing engine, designed for rapid accessibility analysis.[4] It calculates travel times accounting for walking, waiting, riding, and transferring.


Step 1: Acquire GTFS Data

Every transit agency maintains its own GTFS feed. For California, the Cal-ITP project aggregates feeds from 200+ agencies into a single download.[5]

import requests
from pathlib import Path

# Cal-ITP statewide stops (aggregated from all agencies)
CALITP_URL = "https://data.ca.gov/dataset/cal-itp-gtfs-ingest-pipeline-dataset"

# Download the aggregated stops file
response = requests.get(CALITP_URL + "/stops.csv")
Path("data/calitp_stops.csv").write_bytes(response.content)

For single-agency analysis, download directly from the agency:

# Example: VTA (Santa Clara County)
VTA_GTFS_URL = "https://www.vta.org/sites/default/files/google_transit.zip"
response = requests.get(VTA_GTFS_URL)
Path("data/gtfs/vta_gtfs.zip").write_bytes(response.content)

Key files in a GTFS feed:

  • stops.txt: Stop locations (latitude, longitude)
  • routes.txt: Route names and types
  • trips.txt: Individual scheduled trips
  • stop_times.txt: Arrival/departure times at each stop
  • calendar.txt: Service patterns (weekday, weekend)

Step 2: Download OpenStreetMap Data

r5py needs a pedestrian network to route walking segments. OpenStreetMap provides this.

# Download California OSM extract from Geofabrik
OSM_URL = "https://download.geofabrik.de/north-america/us/california-latest.osm.pbf"

response = requests.get(OSM_URL, stream=True)
with open("data/osm/california.osm.pbf", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        f.write(chunk)

For smaller regions, extract a subset:

# Using osmium-tool to extract a bounding box
osmium extract -b -122.5,37.0,-121.5,37.8 california.osm.pbf -o bay_area.osm.pbf

The Bay Area extract is about 200 MB; statewide California is about 1 GB. r5py can handle both, but smaller extracts reduce network build time significantly: the Bay Area extract builds in about 5 minutes, while statewide California takes 20-30 minutes. For regional analysis, extracting only the area you need avoids this overhead.


Step 3: Set Up r5py

r5py requires Java and the R5 JAR file. Installation:

# Install r5py via pip
pip install r5py

# r5py automatically downloads the R5 JAR on first use

Build the transport network:

import r5py

# Build network from GTFS and OSM
transport_network = r5py.TransportNetwork(
    osm_pbf="data/osm/bay_area.osm.pbf",
    gtfs=["data/gtfs/vta_gtfs.zip"]  # Can include multiple agencies
)

Network building takes 5-30 minutes depending on area size and number of GTFS feeds. The result is cached for reuse.


Step 4: Prepare Origins and Destinations

Origins are census tract centroids (where people live). Destinations are grocery stores (where people need to go).

import pandas as pd
import geopandas as gpd

# Load census tract centroids
tracts = gpd.read_file("data/census_tracts.geojson")
origins = tracts[["GEOID", "geometry"]].copy()
origins["id"] = origins["GEOID"]

# Load grocery store locations
stores = pd.read_csv("data/grocery_stores.csv")
destinations = gpd.GeoDataFrame(
    stores,
    geometry=gpd.points_from_xy(stores.longitude, stores.latitude),
    crs="EPSG:4326"
)
destinations["id"] = destinations["store_id"]

Critical: Both origins and destinations need an id column and geometry column in a GeoDataFrame.


Step 5: Calculate Travel Times

r5py's TravelTimeMatrixComputer calculates travel times between all origin-destination pairs:

from datetime import datetime, timedelta

# Set departure time window
departure = datetime(2024, 11, 15, 9, 0)  # Friday 9:00 AM
departure_window = timedelta(hours=2)  # Allow departures until 11:00 AM

# Configure the travel time calculation
travel_time_computer = r5py.TravelTimeMatrixComputer(
    transport_network,
    origins=origins,
    destinations=destinations,
    departure=departure,
    departure_time_window=departure_window,
    transport_modes=[r5py.TransportMode.WALK, r5py.TransportMode.TRANSIT],
    max_time=timedelta(minutes=60),  # Maximum travel time to consider
    percentiles=[50]  # Median travel time across departure window
)

# Calculate!
travel_times = travel_time_computer.compute_travel_times()

For 408 tracts × 6,613 stores = 2,697,704 pairs, this takes about 45 minutes on a laptop with 16GB RAM and an M1 processor. The result is a DataFrame with columns: from_id, to_id, travel_time.


Step 6: Find Nearest Store by Transit

With travel times calculated, find the minimum for each origin:

# Group by origin and find minimum travel time
nearest_by_transit = (
    travel_times
    .groupby("from_id")
    .agg(
        min_transit_time=("travel_time", "min"),
        stores_within_30_min=("travel_time", lambda x: (x <= 30).sum()),
        stores_within_45_min=("travel_time", lambda x: (x <= 45).sum())
    )
    .reset_index()
)

# Merge back to tract data
tracts_with_transit = tracts.merge(
    nearest_by_transit,
    left_on="GEOID",
    right_on="from_id"
)

What Didn't Go as Expected

Several problems emerged during implementation:

Problem 1: Memory Limits

Calculating all pairs at once exceeded available RAM. Solution: batch processing.

# Process in batches of 50 tracts
batch_size = 50
results = []

for i in range(0, len(origins), batch_size):
    batch_origins = origins.iloc[i:i+batch_size]

    computer = r5py.TravelTimeMatrixComputer(
        transport_network,
        origins=batch_origins,
        destinations=destinations,
        # ... same parameters
    )

    batch_results = computer.compute_travel_times()
    results.append(batch_results)

# Combine all batches
travel_times = pd.concat(results, ignore_index=True)

Problem 2: Invalid GTFS Data

Some agency GTFS feeds contain errors: stops with coordinates of (0, 0), routes with no trips, or stop_times with impossible schedules.

# Validate GTFS before using
def validate_gtfs(gtfs_path):
    import zipfile

    with zipfile.ZipFile(gtfs_path) as zf:
        stops = pd.read_csv(zf.open("stops.txt"))

        # Check for invalid coordinates
        invalid = stops[
            (stops["stop_lat"] == 0) |
            (stops["stop_lon"] == 0) |
            (stops["stop_lat"].isna())
        ]

        if len(invalid) > 0:
            print(f"Warning: {len(invalid)} stops with invalid coordinates")
            stops = stops[~stops.index.isin(invalid.index)]

        return stops

Problem 3: OSM Size Limits

Large OSM files can cause r5py to fail. Solution: extract only the region needed.

# Calculate bounding box from your data
min_lon = min(origins.geometry.x.min(), destinations.geometry.x.min()) - 0.1
max_lon = max(origins.geometry.x.max(), destinations.geometry.x.max()) + 0.1
min_lat = min(origins.geometry.y.min(), destinations.geometry.y.min()) - 0.1
max_lat = max(origins.geometry.y.max(), destinations.geometry.y.max()) + 0.1

# Extract using osmium
import subprocess
subprocess.run([
    "osmium", "extract",
    "-b", f"{min_lon},{min_lat},{max_lon},{max_lat}",
    "california.osm.pbf",
    "-o", "region.osm.pbf"
])

When to Use This Approach

Good fit:

  • Research requiring many origin-destination pairs
  • Budget constraints preclude commercial APIs
  • Need for reproducibility (GTFS + OSM are public data)
  • Batch processing is acceptable (not real-time queries)

Less suitable:

  • Real-time routing for individual trips
  • Need for traffic-aware car routing (GTFS is transit only)
  • Very small analyses where API costs are negligible
  • Regions where transit agencies don't publish GTFS feeds (some rural areas, demand-responsive services) or where OpenStreetMap pedestrian paths are incomplete (typically rural or newly developed areas)

Alternatives

Commercial APIs (Google Maps, Mapbox): Easier setup, real-time capabilities, but costly at scale. Google Maps charges $5 per 1,000 routes for transit directions.

OSRM (Open Source Routing Machine): Free and fast, but car/bike/walk only. No transit support.

OpenTripPlanner: Full-featured transit router with real-time updates and trip planning APIs. Deployment requires running a Java server and configuring multiple components (graph building, API endpoints, frontend). This overhead makes sense for transit agency websites serving individual trip queries, but for batch research calculating millions of routes, r5py's single Python script is simpler.

Conveyal Analysis: Web-based interface to R5 with built-in isochrone mapping and accessibility visualization. Handles data loading and visualization without code. The limitation: you can't easily customize routing parameters, export raw travel time matrices, or integrate results into a larger analysis pipeline.


Limitations

Schedule-based, not real-time: GTFS represents planned service, not actual arrivals. Delays, cancellations, and disruptions aren't captured.

Average conditions: The departure time window produces median travel times. Some trips will be faster or slower than calculated.

Walking assumptions: r5py assumes constant walking speed. Elderly residents, those with mobility limitations, or those carrying groceries may walk slower.

Network completeness: OpenStreetMap varies in quality by region. Some pedestrian paths may be missing or incorrectly mapped.


Code Availability

Complete code for this analysis: GitHub repository

The repository includes:

  • scripts/50_download_calitp_gtfs.py: GTFS download and validation
  • scripts/53_extract_regional_osm.py: OSM extraction utilities
  • scripts/41_calculate_transit_all_counties.py: r5py routing implementation with batch processing
  • scripts/82_mobility_deserts_calitp.py: Post-processing to identify mobility deserts
  • data/: Sample data files and expected outputs

Notes

[1] Google Maps Platform pricing as of November 2024. Transit Directions API: $5.00 per 1,000 requests. 2,697,704 requests × $0.005 = $13,488.52.

[2] GitHub repository: github.com/dphdmae/foodsecurity_mobility. MIT License.

[3] GTFS was developed by Google and TriMet (Portland, OR) in 2005. Over 2,500 transit agencies worldwide now publish GTFS feeds. See gtfs.org for specification details.

[4] Conway, M. W., Byrd, A., & van Eggermond, M. (2018). "Accounting for uncertainty and variation in accessibility metrics for public transport sketch planning." Journal of Transport and Land Use, 11(1), 541-558. For r5py implementation, see Pereira, R. H. M., Saraiva, M., Herszenhut, D., Braga, C. K. V., & Conway, M. W. (2021). "r5r: Rapid Realistic Routing on Multimodal Transport Networks with R5 in R." Findings. https://doi.org/10.32866/001c.21262

[5] Cal-ITP (California Integrated Travel Project) aggregates GTFS data from California transit agencies. Data available at data.ca.gov.


Tags: #FoodSecurity #TransitAnalysis #GTFS #r5py #OpenStreetMap #Methods #Tutorial #OpenSource


How to Cite This Research

Too Early To Say. "How to Calculate 2.7 Million Transit Routes for Free." November 2025. https://www.tooearlytosay.com/research/food-security/transit-routing-free-tools/
Copy citation