If we want to measure how long it actually takes transit-dependent residents to reach grocery stores, we need to calculate travel times at scale. For 408 census tracts and 6,613 grocery stores, that means 2,697,704 origin-destination pairs. Using Google Maps API, this would cost approximately $13,500 at current rates ($5 per 1,000 routes).[1] We can calculate all of them for $0.
Why does this matter? Many food access questions require transit time calculations, but commercial API pricing puts large-scale analysis out of reach for unfunded projects. Studying transit-based food access at scale requires free alternatives. They exist, but they require assembling multiple data sources and learning new tools.
This post walks through what we learned building this workflow: how to acquire GTFS transit data, build a pedestrian network from OpenStreetMap, and route with r5py. All code is available upone request.[2]
The Tool Stack
Four free, open-source components make this possible:
| Component | Tool | Purpose |
|---|---|---|
| Transit schedules | GTFS feeds | Bus and rail routes, stops, schedules |
| Street network | OpenStreetMap | Walking paths between transit and destinations |
| Routing engine | r5py (Conveyal R5) | Calculate multimodal travel times |
| Destination data | Google Places API | Grocery store locations (some API cost here) |
GTFS (General Transit Feed Specification) is the standard format for transit data. Most US transit agencies publish free GTFS feeds that include routes, stops, and schedules.[3]
OpenStreetMap provides free street network data worldwide. For transit routing, we need pedestrian paths: sidewalks, crosswalks, and building entrances.
r5py is a Python wrapper for Conveyal's R5 routing engine, designed for rapid accessibility analysis.[4] It calculates travel times accounting for walking, waiting, riding, and transferring.
Step 1: Acquire GTFS Data
Every transit agency maintains its own GTFS feed. For California, the Cal-ITP project aggregates feeds from 200+ agencies into a single download.[5]
import requests
from pathlib import Path
# Cal-ITP statewide stops (aggregated from all agencies)
CALITP_URL = "https://data.ca.gov/dataset/cal-itp-gtfs-ingest-pipeline-dataset"
# Download the aggregated stops file
response = requests.get(CALITP_URL + "/stops.csv")
Path("data/calitp_stops.csv").write_bytes(response.content)For single-agency analysis, download directly from the agency:
# Example: VTA (Santa Clara County)
VTA_GTFS_URL = "https://www.vta.org/sites/default/files/google_transit.zip"
response = requests.get(VTA_GTFS_URL)
Path("data/gtfs/vta_gtfs.zip").write_bytes(response.content)Key files in a GTFS feed:
stops.txt: Stop locations (latitude, longitude)routes.txt: Route names and typestrips.txt: Individual scheduled tripsstop_times.txt: Arrival/departure times at each stopcalendar.txt: Service patterns (weekday, weekend)
Step 2: Download OpenStreetMap Data
r5py needs a pedestrian network to route walking segments. OpenStreetMap provides this.
# Download California OSM extract from Geofabrik
OSM_URL = "https://download.geofabrik.de/north-america/us/california-latest.osm.pbf"
response = requests.get(OSM_URL, stream=True)
with open("data/osm/california.osm.pbf", "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)For smaller regions, extract a subset:
# Using osmium-tool to extract a bounding box
osmium extract -b -122.5,37.0,-121.5,37.8 california.osm.pbf -o bay_area.osm.pbfThe Bay Area extract is about 200 MB; statewide California is about 1 GB. r5py can handle both, but smaller extracts reduce network build time significantly: the Bay Area extract builds in about 5 minutes, while statewide California takes 20-30 minutes. For regional analysis, extracting only the area you need avoids this overhead.
Step 3: Set Up r5py
r5py requires Java and the R5 JAR file. Installation:
# Install r5py via pip
pip install r5py
# r5py automatically downloads the R5 JAR on first useBuild the transport network:
import r5py
# Build network from GTFS and OSM
transport_network = r5py.TransportNetwork(
osm_pbf="data/osm/bay_area.osm.pbf",
gtfs=["data/gtfs/vta_gtfs.zip"] # Can include multiple agencies
)Network building takes 5-30 minutes depending on area size and number of GTFS feeds. The result is cached for reuse.
Step 4: Prepare Origins and Destinations
Origins are census tract centroids (where people live). Destinations are grocery stores (where people need to go).
import pandas as pd
import geopandas as gpd
# Load census tract centroids
tracts = gpd.read_file("data/census_tracts.geojson")
origins = tracts[["GEOID", "geometry"]].copy()
origins["id"] = origins["GEOID"]
# Load grocery store locations
stores = pd.read_csv("data/grocery_stores.csv")
destinations = gpd.GeoDataFrame(
stores,
geometry=gpd.points_from_xy(stores.longitude, stores.latitude),
crs="EPSG:4326"
)
destinations["id"] = destinations["store_id"]Critical: Both origins and destinations need an id column and geometry column in a GeoDataFrame.
Step 5: Calculate Travel Times
r5py's TravelTimeMatrixComputer calculates travel times between all origin-destination pairs:
from datetime import datetime, timedelta
# Set departure time window
departure = datetime(2024, 11, 15, 9, 0) # Friday 9:00 AM
departure_window = timedelta(hours=2) # Allow departures until 11:00 AM
# Configure the travel time calculation
travel_time_computer = r5py.TravelTimeMatrixComputer(
transport_network,
origins=origins,
destinations=destinations,
departure=departure,
departure_time_window=departure_window,
transport_modes=[r5py.TransportMode.WALK, r5py.TransportMode.TRANSIT],
max_time=timedelta(minutes=60), # Maximum travel time to consider
percentiles=[50] # Median travel time across departure window
)
# Calculate!
travel_times = travel_time_computer.compute_travel_times()For 408 tracts × 6,613 stores = 2,697,704 pairs, this takes about 45 minutes on a laptop with 16GB RAM and an M1 processor. The result is a DataFrame with columns: from_id, to_id, travel_time.
Step 6: Find Nearest Store by Transit
With travel times calculated, find the minimum for each origin:
# Group by origin and find minimum travel time
nearest_by_transit = (
travel_times
.groupby("from_id")
.agg(
min_transit_time=("travel_time", "min"),
stores_within_30_min=("travel_time", lambda x: (x <= 30).sum()),
stores_within_45_min=("travel_time", lambda x: (x <= 45).sum())
)
.reset_index()
)
# Merge back to tract data
tracts_with_transit = tracts.merge(
nearest_by_transit,
left_on="GEOID",
right_on="from_id"
)What Didn't Go as Expected
Several problems emerged during implementation:
Problem 1: Memory Limits
Calculating all pairs at once exceeded available RAM. Solution: batch processing.
# Process in batches of 50 tracts
batch_size = 50
results = []
for i in range(0, len(origins), batch_size):
batch_origins = origins.iloc[i:i+batch_size]
computer = r5py.TravelTimeMatrixComputer(
transport_network,
origins=batch_origins,
destinations=destinations,
# ... same parameters
)
batch_results = computer.compute_travel_times()
results.append(batch_results)
# Combine all batches
travel_times = pd.concat(results, ignore_index=True)Problem 2: Invalid GTFS Data
Some agency GTFS feeds contain errors: stops with coordinates of (0, 0), routes with no trips, or stop_times with impossible schedules.
# Validate GTFS before using
def validate_gtfs(gtfs_path):
import zipfile
with zipfile.ZipFile(gtfs_path) as zf:
stops = pd.read_csv(zf.open("stops.txt"))
# Check for invalid coordinates
invalid = stops[
(stops["stop_lat"] == 0) |
(stops["stop_lon"] == 0) |
(stops["stop_lat"].isna())
]
if len(invalid) > 0:
print(f"Warning: {len(invalid)} stops with invalid coordinates")
stops = stops[~stops.index.isin(invalid.index)]
return stopsProblem 3: OSM Size Limits
Large OSM files can cause r5py to fail. Solution: extract only the region needed.
# Calculate bounding box from your data
min_lon = min(origins.geometry.x.min(), destinations.geometry.x.min()) - 0.1
max_lon = max(origins.geometry.x.max(), destinations.geometry.x.max()) + 0.1
min_lat = min(origins.geometry.y.min(), destinations.geometry.y.min()) - 0.1
max_lat = max(origins.geometry.y.max(), destinations.geometry.y.max()) + 0.1
# Extract using osmium
import subprocess
subprocess.run([
"osmium", "extract",
"-b", f"{min_lon},{min_lat},{max_lon},{max_lat}",
"california.osm.pbf",
"-o", "region.osm.pbf"
])When to Use This Approach
Good fit:
- Research requiring many origin-destination pairs
- Budget constraints preclude commercial APIs
- Need for reproducibility (GTFS + OSM are public data)
- Batch processing is acceptable (not real-time queries)
Less suitable:
- Real-time routing for individual trips
- Need for traffic-aware car routing (GTFS is transit only)
- Very small analyses where API costs are negligible
- Regions where transit agencies don't publish GTFS feeds (some rural areas, demand-responsive services) or where OpenStreetMap pedestrian paths are incomplete (typically rural or newly developed areas)
Alternatives
Commercial APIs (Google Maps, Mapbox): Easier setup, real-time capabilities, but costly at scale. Google Maps charges $5 per 1,000 routes for transit directions.
OSRM (Open Source Routing Machine): Free and fast, but car/bike/walk only. No transit support.
OpenTripPlanner: Full-featured transit router with real-time updates and trip planning APIs. Deployment requires running a Java server and configuring multiple components (graph building, API endpoints, frontend). This overhead makes sense for transit agency websites serving individual trip queries, but for batch research calculating millions of routes, r5py's single Python script is simpler.
Conveyal Analysis: Web-based interface to R5 with built-in isochrone mapping and accessibility visualization. Handles data loading and visualization without code. The limitation: you can't easily customize routing parameters, export raw travel time matrices, or integrate results into a larger analysis pipeline.
Limitations
Schedule-based, not real-time: GTFS represents planned service, not actual arrivals. Delays, cancellations, and disruptions aren't captured.
Average conditions: The departure time window produces median travel times. Some trips will be faster or slower than calculated.
Walking assumptions: r5py assumes constant walking speed. Elderly residents, those with mobility limitations, or those carrying groceries may walk slower.
Network completeness: OpenStreetMap varies in quality by region. Some pedestrian paths may be missing or incorrectly mapped.
Code Availability
Complete code for this analysis: GitHub repository
The repository includes:
scripts/50_download_calitp_gtfs.py: GTFS download and validationscripts/53_extract_regional_osm.py: OSM extraction utilitiesscripts/41_calculate_transit_all_counties.py: r5py routing implementation with batch processingscripts/82_mobility_deserts_calitp.py: Post-processing to identify mobility desertsdata/: Sample data files and expected outputs
Notes
[1] Google Maps Platform pricing as of November 2024. Transit Directions API: $5.00 per 1,000 requests. 2,697,704 requests × $0.005 = $13,488.52. ↩
[2] GitHub repository: github.com/dphdmae/foodsecurity_mobility. MIT License. ↩
[3] GTFS was developed by Google and TriMet (Portland, OR) in 2005. Over 2,500 transit agencies worldwide now publish GTFS feeds. See gtfs.org for specification details. ↩
[4] Conway, M. W., Byrd, A., & van Eggermond, M. (2018). "Accounting for uncertainty and variation in accessibility metrics for public transport sketch planning." Journal of Transport and Land Use, 11(1), 541-558. For r5py implementation, see Pereira, R. H. M., Saraiva, M., Herszenhut, D., Braga, C. K. V., & Conway, M. W. (2021). "r5r: Rapid Realistic Routing on Multimodal Transport Networks with R5 in R." Findings. https://doi.org/10.32866/001c.21262 ↩
[5] Cal-ITP (California Integrated Travel Project) aggregates GTFS data from California transit agencies. Data available at data.ca.gov. ↩
Tags: #FoodSecurity #TransitAnalysis #GTFS #r5py #OpenStreetMap #Methods #Tutorial #OpenSource
How to Cite This Research
Too Early To Say. "How to Calculate 2.7 Million Transit Routes for Free." November 2025. https://www.tooearlytosay.com/research/food-security/transit-routing-free-tools/Copy citation