AI for Applied Researchers: A Five-Step Sequence

This sequence is the practice layer behind Too Early To Say. It shows how an applied economist can use an AI coding agent inside a study, from the first literature scan to the replication package. Each step is one phase of an administrative-data or survey-data analysis. The agent handles routine search, translation, and first-pass implementation. The design, identification, and checks remain ours.

This is for applied microeconomists, public-sector researchers, graduate students, and policy analysts who already know their methods and want an AI coding agent inside the workflow without losing control of the analysis. The problem it solves is simple. Hours go to plumbing work that an agent can do, while design, diagnostics, and interpretation are squeezed into what is left.

One idea runs through all five steps. A precise specification is what lets an agent do useful work. A vague instruction produces code that runs but may not match the research intent. When the methods paragraph names the package, the data source, the sample, and the window, the agent can implement it. When it says only "standard methods," package, sample, and window choices remain unresolved, and those choices can change the estimate.

The five steps follow the order a study usually moves through. In practice, most projects loop back as researchers identify new questions.

The five steps

STEP 1

Literature review

Outcome: a frozen scoping note.

In the SNAP Broad-Based Categorical Eligibility case, the source study reports a +5.9 percent two-way fixed-effects estimate. Public-material status: external source article; no TETS replication package is linked. The literature step records the study question, population, estimand, and evidence limits in a frozen note. The agent collects and compares sources. We decide what the sources support and freeze the scope.

STEP 2

Code generation

Outcome: a working pipeline built from a methods paragraph.

In the transit-access case, the article reports about 0.7 hours with a precise specification. No public run record reproduces the timing comparison. The code-generation step specifies the package, sample, and window before implementation. The agent drafts from that specification. We choose the window, metric, and robustness checks.

Articles

One Context File: A Workflow for Persistent Project Context The workflow is instructional. Its session times, prevented-error counts, and savings estimates are not independently timed.
Methods-to-Code with AI: An Article-Reported Workflow The transit implementation and timing comparison are article-reported and not publicly reproduced.

STEP 3

Data cleaning

Outcome: an analysis-ready file and a reproducible final-test record.

In the grocery-store validation case, the article reports that 27% of candidate locations were not grocery stores. The related full-population result remains under reconciliation. The data-cleaning step records the exclusion rules and classifier threshold applied to candidate locations. The agent applies those rules and the classifier. We decide what error rate is acceptable and how to report it.

Articles

Grocery Store Classifier Results Under Review The labeling workflow is documented. The linked article marks its full-population classification under reconciliation.
Robust API Collection: Pagination, Rate Limits, Failure Recovery The collection patterns remain instructional. The earlier execution ledger and cost totals have been withdrawn.

STEP 4

Quality assurance

Outcome: a QA block.

In the bank-closure and SNAP-participation case, the article reports a joint pre-trend p-value of 0.9997 even though additional checks change the interpretation. No public package reproduces the numerical case. The quality-assurance step requires a pre-trend test, specification checks, and sensitivity analysis before release. The agent scripts and runs those checks. We decide whether an estimate is strong enough to report.

Articles

How do we know an AI's estimator does what we meant? The article's checks detect a mismatch in an otherwise clean run by comparing the specification, planted truth, broken symmetries, and source implementation.
AI Econometrics: Using AI for Code, Not for Identification The assistant drafts code and runs specifications, while we set the estimand and argue the identification.

STEP 5

Documentation

Outcome: a traceable claim-to-output record.

The SNAP Broad-Based Categorical Eligibility article reports a +1.37 percentage-point estimate and documents its standard errors, sample, and weighting. A matching public package is not linked, so the case is article only. The documentation step defines the standard a future package must meet. The agent assists with structure and cross-references. We remain responsible for accuracy.

Articles

End-of-Session Hygiene: What to Capture Before Context Resets The article presents a capture routine for handing work to the next session.
When the parallel-trends test fails on one lead, what's left? The estimate and panel are article-reported. Public-material status: article only.

Why this comes from Too Early To Say

Each step in this sequence is grounded in analyses that Too Early To Say has already published. Public materials differ by case. Some linked studies have runnable packages, some have supporting code, and some currently provide the article only. The methodology section documents the workflows. The SNAP Broad-Based Categorical Eligibility analysis grounds the literature step. The methods-paragraph-to-code piece provides an article-reported implementation example for the code step. The step cards link each published article to the phase it grounds.

This sequence organizes those lessons into one path. It is an entry point for applied researchers who want an AI coding agent in the loop and want the evidence status visible before a numerical claim is reused.