Data Quality
6 articles
How to Build a Census Data Pipeline That Doesn't Silently Fail
A Python workflow for pulling ACS data from the Census API, including the validation checks that prevent bad data from reaching the analysis.
How to Validate GTFS Feeds Before They Break the Routing Engine
A Python workflow for catching the transit data problems that structural checks miss. Six validation layers from download fallbacks to multi-agency smoke tests.
Spatial Analysis with GeoPandas: From Joins to Autocorrelation
A spatial analysis workflow that starts with point-to-polygon joins and builds toward spatial weights, autocorrelation testing, and LISA cluster detection using Python.
The Data Quality Problem: How We Went From 49% to 12% Mobility Deserts
We found 49% of California census tracts were mobility deserts. After getting complete transit data, the corrected figure is 12%. Here's what went wrong—and how to avoid it.
400 Labels to 94% Accuracy: Validating Grocery Store Data
Google Places returned thousands of 'grocery stores.' Many weren't. Here's how a classifier separates real grocery stores from gas stations and liquor stores.
The Retail Density Paradox: Why More Stores Mean Worse Data
Developing a verification methodology for EBT acceptance across 7,000 California food retailers