AI Methods

AI-assisted applied economics research. Each article ships with Python code and replication materials.

Build Your Own Tools: Tutorials

Articles that walk through building research tools, end-to-end. Each pairs with code in our GitHub or with live tools at Tools & Code. For the full sequence these methods follow, from the first literature scan to the replication package, see AI for Applied Researchers, our five-step Start Here guide.

Matching in Python: a balanced covariate table doesn't make the estimate valid

A propensity-score match can pass every balance check and still return the wrong number. Balance says the matched groups look alike on the covariates we observed; it does not say a comparable control existed.

Jul 2026

Applied Econometrics Reproducibility Causal Inference

Synthetic control in Python: read the pre-fit before the gap

A zero-error pre-treatment fit returns a clean gap of 6.1 against a planted effect of 6.0, and the same zero-error fit returns a wrong gap of 4.3 when no valid counterfactual exists. The pre-fit that is allowed to be large is the diagnostic to trust.

Jul 2026

applied-econometrics causal-inference python

Regression discontinuity in Python: getting the effect at the cutoff right

A global polynomial fit returns a clean, plausible 1.8 where the effect planted at the cutoff is 0.75. A local fit recovers about 0.75. How to estimate a regression discontinuity in Python, and the confounder the local fit still cannot see.

Jul 2026

applied-econometrics causal-inference python

How to tell whether a double machine learning estimate is right

Double machine learning in Python: why a naive plug-in reads a true effect of 1.0 as 0.55, how cross-fitting recovers 0.97, and the confounder it still cannot detect.

Jul 2026

applied-econometrics causal-inference python

Using difference-in-differences in practice

When difference-in-differences is the right tool, the three assumptions stated as decisions, and the ways it breaks, each shown in a worked Too Early To Say case with open code. A decision table maps the situation to the estimator and the check that keeps it honest.

Jul 2026

applied-econometrics causal-inference difference-in-differences

How instrumental variables help in causal inference: a 2SLS worked example in Python

Instrumental variables buy identification by trading one testable assumption for two, one of which can never be tested. A reproducible 2SLS walkthrough on the Mroz data: OLS returns 0.1075, IV returns 0.0614, and the first-stage F of 55.4 is what tells us the instrument is strong enough to trust.

Jul 2026

applied-econometrics causal-inference python

When a policy reaches only a few units: rolling difference-in-differences (lwdid)

A new rolling difference-in-differences estimator (lwdid) gives credible effects from one treated unit and a few controls. With so few units the transformation choice drives the answer: the standard version can report twice the truth, and unit-specific detrending recovers it.

Jun 2026

ai-workflow applied-econometrics causal-inference

AI Econometrics: Using AI for Code, Not for Identification

An AI assistant can draft econometric code and run specifications in seconds. It cannot decide the estimand, argue the identification, or verify that the code computes what the design claims. Two worked cases show where that line falls.

Jun 2026

ai-workflow applied-econometrics causal-inference

How do we know an AI's estimator does what we meant?

Rebuilding an estimator from a paper or package, the code can run without error and still be wrong. A routine that catches it: name the low-visibility choices in a spec, plant a known truth in a simulation, and read the code against its source.

Jun 2026

applied-econometrics reproducibility ai-workflow

Logistic regression beats LLM readouts on survey prediction

On a real public-health prediction task, a plain logistic regression on seven demographic facts outpredicted a language-model activation pipeline built on the same facts, AUC 0.769 vs 0.747, and steering the model's internals changed nothing about the ranking while making the probabilities less trustworthy.

Jun 2026

ai-workflow applied-econometrics machine-learning

Steering vectors estimate an average regression gradient

Activation steering, the trick of adding a vector to a language model's internal activations, approximates an average regression gradient: the alignment in our data is directional, short of the criterion we wrote down in advance for calling the two identical. Once we see the connection, we can ask the classic estimation questions and get classic answers.

Jun 2026

ai-workflow applied-econometrics machine-learning

Prediction-powered inference corrects AI-imputed survey estimates

Treating AI-imputed survey responses as real observations can understate a prevalence estimate by a factor of three while reporting a tight confidence interval. A regression adjustment a decade older than the models lets the predictions sharpen the estimate without ever making it worse.

Jun 2026

ai-workflow applied-econometrics machine-learning

Well-Executed But Not Important: Reading Importance From the Published Record

An LLM classification of 2,493 health-economics articles to operationalize importance. Calibration is 35% of publications but 18% of citations; Identification carries a +91% premium and Reframing +126%, holding topic, journal, and year constant. Pairs with the journal-topic-shares replication repo.

May 2026

meta-research llm-classification citations

Cycling Through Bad Ideas Faster: A Medicaid Branding Worked Example

A two-week solo cycle through three coding rules, a controls ladder, and a behavioral-mechanism test on state Medicaid program branding, ending at a bounded null after expansion-cohort fixed effects collapse a naive +22% headline. Companion to the meta-research piece above.

May 2026

medicaid difference-in-differences ai-workflow

Claude Code Skills Get Stale. Audit Them Quarterly.

A repeatable audit so the skills, hooks, and memory entries we wrote for older models stop quietly shaping today's numbers.

May 2026

tutorial claude-code

A Pre-Analysis Plan for Your Coding Agent

A three-layer architecture, rule, gate, and verification, for keeping a reasoning agent disciplined when system prompts alone are not enough.

May 2026

tutorial agents

Building a Literature Surveillance System

Combining free tools (Google Scholar, Semantic Scholar) with an AI assistant that handles the glue: citation networks, source merging, and the quiet failures.

April 2026

tutorial literature review

One Context File, Zero Re-Explanations

How we set up a CLAUDE.md context file so research context survives across sessions, and we stop re-explaining the same project every time.

October 2025

tutorial claude-md

From Methods Paragraph to Working Pipeline

Translating a methodology section into executable code with AI assistance, step by step.

October 2025

tutorial workflow

47 Scripts to 15: Cleaning a Research Codebase

Using an AI assistant to refactor and consolidate a sprawling research codebase without losing the analytical thread.

November 2025

tutorial refactoring

6,613 Stores, $147, Zero Lost Data

Building resilient data pipelines that handle API failures, rate limits, and edge cases without losing rows.

November 2025

tutorial api

400 Labels to 94% Accuracy

Building and validating a grocery store classifier through iterative labeling, with the loop documented end-to-end.

October 2025

tutorial validation

EBT Verification Methodology

Cross-validating SNAP retailer data against multiple authoritative sources, so the labels we trust have a paper trail.

October 2025

tutorial validation

How to Calculate 2.7M Transit Routes for Free

Step-by-step guide to r5py, GTFS data, and multimodal accessibility analysis at zero cost.

November 2025

tutorial r5py GTFS

Most Recent

AI Econometrics: Using AI for Code, Not for Identification

Jun 2026

ai-workflow applied-econometrics causal-inference

How do we know an AI's estimator does what we meant?

Jun 2026

applied-econometrics reproducibility ai-workflow

When a policy reaches only a few units: rolling difference-in-differences (lwdid)

Jun 2026

ai-workflow applied-econometrics causal-inference

Well-Executed But Not Important: Reading Importance From the Published Record

When AI thins out the technical-flaws desk-rejection pretext, editors will have to learn to say "well-executed but not important" on the record. We classify 2,493 articles across four health-economics journals to ask what "important" has actually meant.

May 2026

meta-research citations llm-classification

Cycling Through Bad Ideas Faster: A Medicaid Branding Worked Example

What AI actually adds to solo research is fast iteration through ideas that turn out to be wrong, with new techniques sometimes emerging as byproducts of the failed attempts.

May 2026

medicaid difference-in-differences ai-workflow

Claude Code Skills Get Stale. Audit Them Quarterly.

Every skill, hook, and memory entry written for an older model is a patch with an expiration date. In empirical research, the expired ones produce wrong numbers that look right and shape the policy decisions built on them.

May 2026

claude-code research-workflow reproducibility

What AI Impact Looks Like in the Slow Data

Usage telemetry sees AI adoption; slow public data sees household conditions. The same AI tooling can read both at the cadence either one needs.

May 2026

ai-impact data-monitoring

A Pre-Analysis Plan for Your Coding Agent

A three-layer architecture for keeping reasoning agents disciplined: rule, gate, and verification. Trained priors beat system prompts, so reliable behavior redirection needs architecture, not instruction.

May 2026

agents verification

Building a Literature Surveillance System

Free tools like Google Scholar alerts and Semantic Scholar already monitor academic literature. What an AI coding assistant adds is the glue: combining sources, following citation networks, and catching the quiet failures that make AI-gathered references dangerous.

April 2026

literature review skills

Browse all 71 methodology articles by category below.

Medicaid Fraud Detection

What 227 Million Rows of Medicaid Data Can and Can't Tell Us

The largest Medicaid dataset in history just went public. What it contains, what's missing, and why that matters for fraud screening.

February 2026

T-MSIS data quality

The Label Problem: Why Fraud Labels Are Harder Than They Look

Exclusion lists are the closest thing we have to fraud labels. They are further from ground truth than most analysts assume.

February 2026

LEIE labels

What Billing Patterns Actually Look Like

Comparing excluded and non-excluded providers across billing volume, coding concentration, and temporal patterns.

February 2026

billing analysis peer groups

Can a Classifier Find What Simpler Methods Miss?

Building a supervised fraud classifier with gradient boosting, SHAP interpretation, and honest temporal validation.

February 2026

machine learning SHAP

AI-Assisted Research

Well-Executed But Not Important: Reading Importance From the Published Record

May 2026

meta-research llm-classification citations

Cycling Through Bad Ideas Faster: A Medicaid Branding Worked Example

A two-week solo cycle through three coding rules, a controls ladder, and a behavioral-mechanism test, ending at a null. What AI compresses is the calendar time of discarding bad ideas.

May 2026

medicaid difference-in-differences ai-workflow

One Context File, Zero Re-Explanations

How CLAUDE.md files maintain research context across sessions, eliminating repetitive explanations.

October 2025

From Methods Paragraph to Working Pipeline

Translating a methodology section into executable code with AI assistance.

October 2025

47 Scripts to 15: Cleaning a Research Codebase

Using AI to refactor and consolidate a sprawling research codebase.

November 2025

Data Collection & Validation

6,613 Stores, $147, Zero Lost Data

Building resilient data pipelines that handle API failures, rate limits, and edge cases.

November 2025

400 Labels to 94% Accuracy

Building and validating a grocery store classifier through iterative labeling.

October 2025

EBT Verification Methodology

Cross-validating SNAP retailer data against multiple authoritative sources.

October 2025

Spatial Analysis

How to Calculate 2.7M Transit Routes for Free

Step-by-step guide to r5py, GTFS data, and multimodal accessibility analysis.

November 2025

tutorial r5py GTFS

Residualized Accessibility Index

Separating transit access from confounding factors using regression residuals.

November 2025

Frequently Asked Questions

What is AI-assisted research?

AI-assisted research uses large language models like Claude to accelerate the translation of methodological expertise into working code. The researcher provides domain knowledge, variable definitions, and methodological decisions through context files (CLAUDE.md). The AI helps implement these ideas as code, identifies edge cases, and assists with refactoring. AI assistance doesn't replace expertise; it multiplies its impact. See our article on context files in research.

How do you calculate transit accessibility for free?

We use r5py, a Python library built on Conveyal's R5 routing engine. Combined with publicly available GTFS transit feeds, it can calculate millions of multimodal routes at zero cost. Our r5py tutorial walks through the complete process with working code examples.

How do you validate data quality?

We cross-validate against multiple authoritative sources. For grocery store data, we compared USDA Food Access Atlas listings against the official SNAP retailer database, California ABC license records, and manual verification. This iterative process, documented in our grocery store classifier article, achieved 94% accuracy through 400 hand-labeled examples.

Can I replicate your research?

Yes. Every article links to a public GitHub repository containing all data and code needed to reproduce the analysis. Our main replication repository contains 18 research projects with complete documentation.

AI Methods

Build Your Own Tools: Tutorials

Matching in Python: a balanced covariate table doesn't make the estimate valid

Synthetic control in Python: read the pre-fit before the gap

Regression discontinuity in Python: getting the effect at the cutoff right

How to tell whether a double machine learning estimate is right

Using difference-in-differences in practice

How instrumental variables help in causal inference: a 2SLS worked example in Python

When a policy reaches only a few units: rolling difference-in-differences (lwdid)

AI Econometrics: Using AI for Code, Not for Identification

How do we know an AI's estimator does what we meant?

Logistic regression beats LLM readouts on survey prediction

Steering vectors estimate an average regression gradient

Prediction-powered inference corrects AI-imputed survey estimates

Well-Executed But Not Important: Reading Importance From the Published Record

Cycling Through Bad Ideas Faster: A Medicaid Branding Worked Example

Claude Code Skills Get Stale. Audit Them Quarterly.

A Pre-Analysis Plan for Your Coding Agent

Building a Literature Surveillance System

One Context File, Zero Re-Explanations

From Methods Paragraph to Working Pipeline

47 Scripts to 15: Cleaning a Research Codebase

6,613 Stores, $147, Zero Lost Data

400 Labels to 94% Accuracy

EBT Verification Methodology

How to Calculate 2.7M Transit Routes for Free

Most Recent

AI Econometrics: Using AI for Code, Not for Identification

How do we know an AI's estimator does what we meant?

When a policy reaches only a few units: rolling difference-in-differences (lwdid)

Logistic regression beats LLM readouts on survey prediction

Steering vectors estimate an average regression gradient

Prediction-powered inference corrects AI-imputed survey estimates

Well-Executed But Not Important: Reading Importance From the Published Record

Cycling Through Bad Ideas Faster: A Medicaid Branding Worked Example

Claude Code Skills Get Stale. Audit Them Quarterly.

What AI Impact Looks Like in the Slow Data

A Pre-Analysis Plan for Your Coding Agent

Building a Literature Surveillance System

Medicaid Fraud Detection

What 227 Million Rows of Medicaid Data Can and Can't Tell Us

The Label Problem: Why Fraud Labels Are Harder Than They Look

What Billing Patterns Actually Look Like

Can a Classifier Find What Simpler Methods Miss?

AI-Assisted Research

Well-Executed But Not Important: Reading Importance From the Published Record

Cycling Through Bad Ideas Faster: A Medicaid Branding Worked Example

One Context File, Zero Re-Explanations

From Methods Paragraph to Working Pipeline

47 Scripts to 15: Cleaning a Research Codebase

Data Collection & Validation

6,613 Stores, $147, Zero Lost Data

400 Labels to 94% Accuracy

EBT Verification Methodology

Spatial Analysis

How to Calculate 2.7M Transit Routes for Free

Residualized Accessibility Index

More Methodology Articles

AI Research Workflows

Building Our Research System: Putting It All Together

Claude Code Guide

Your First Session: What Claude Code Is and Isn't

A Starter Kit for the Economist's First Week in Claude Code

The Cold Start Problem

Why It Forgot Everything: Understanding Context

Context Window Budgeting

Reading Our Analysis Files

Research Phases Need Different Prompts

Creating Skills: Reusable Workflows for Research

Hooks: Automation Without Asking

Connecting Claude to Outside Services: FRED, Census, and Beyond

Creating Helpers: When to Delegate Work

What Agents Actually Do (And What They Don’t)

What We Mistake for AI Capability

The Verification Tax

7 Copy-Paste Cycles to 1 Command: What Changes with Agent-Based Coding

End-of-Session Hygiene: What to Capture Before Context Resets

Reading Your Own Data

Running Claude Code skills, for applied economists

Why Claude and ChatGPT Struggle with Research Graphics (And What Makes Antigravity Prompts Work)