Data Quality

6 articles

How to Build a Census Data Pipeline That Doesn't Silently Fail

A Python workflow for pulling ACS data from the Census API, including the validation checks that prevent bad data from reaching the analysis.

Feb 2026 · Methodology

How to Validate GTFS Feeds Before They Break the Routing Engine

A Python workflow for catching the transit data problems that structural checks miss. Six validation layers from download fallbacks to multi-agency smoke tests.

Feb 2026 · Methodology

Spatial Analysis with GeoPandas: From Joins to Autocorrelation

A spatial analysis workflow that starts with point-to-polygon joins and builds toward spatial weights, autocorrelation testing, and LISA cluster detection using Python.

Feb 2026 · Methodology

The Data Quality Problem: How We Went From 49% to 12% Mobility Deserts

We found 49% of California census tracts were mobility deserts. After getting complete transit data, the corrected figure is 12%. Here's what went wrong—and how to avoid it.

Nov 2025 · Transit Equity

400 Labels to 94% Accuracy: Validating Grocery Store Data

Google Places returned thousands of 'grocery stores.' Many weren't. Here's how a classifier separates real grocery stores from gas stations and liquor stores.

Oct 2025 · Methodology

The Retail Density Paradox: Why More Stores Mean Worse Data

Developing a verification methodology for EBT acceptance across 7,000 California food retailers

Oct 2025 · Methodology