AI for Applied Researchers: A Five-Step Sequence
This sequence is the practice layer behind Too Early To Say. It shows how an applied economist can use an AI coding agent inside a study, from the first literature scan to the replication package. Each step is one phase of an administrative-data or survey-data analysis. The agent handles routine search, translation, and first-pass implementation. The design, identification, and checks remain ours.
This is for applied microeconomists, public-sector researchers, graduate students, and policy analysts who already know their methods and want an AI coding agent inside the workflow without losing control of the analysis. The problem it solves is simple. Hours go to plumbing work that an agent can do, while design, diagnostics, and interpretation are squeezed into what is left.
One idea runs through all five steps. A precise specification is what lets an agent do useful work. A vague instruction produces code that runs but may not match the research intent. When the methods paragraph names the package, the data source, the sample, and the window, the agent can implement it. When it says only "standard methods," it guesses, and every guess moves the result.
The five steps follow the order a study usually moves through. In practice, most projects loop back as new questions surface.
Literature review
Outcome: a mapped literature and a benchmark that any estimate has to land near.
We use the agent to widen the search and pull figures, and we verify the findings against the sources before building on them.
Code generation
Outcome: a working pipeline built from a methods paragraph.
We hand the agent a specific specification and let it generate and refactor the first-pass code, while we choose the design and diagnostics.
Data cleaning
Outcome: analysis-ready data with every transformation logged.
We encode each recode and filter as a step that the agent can run, log, and repeat in the same way twice.
Quality assurance
Outcome: an estimate that has been tested, not just produced.
We use the agent to help script parallel-trends checks, placebos, sensitivity bounds, and decompositions, then interpret whether the estimate holds.
Documentation
Outcome: methods and provenance that a reader can reproduce.
We write the study so every number traces back to raw data and a script, and the agent assists with structure, cross-references, and consistency checks.
The five steps
Literature review
Outcome: a mapped literature and a stated gap the study fills.
In our Medicaid data work, a public file with roughly 227 million rows and seven columns appeared within hours in fraud narratives. A basic check of how provider registration works showed why that file is a screening input, not proof of misconduct. The literature step in this sequence rests on that lesson. We use the agent to collect and summarize findings, and we decide what the source can and cannot support.
Code generation
Outcome: a working pipeline built from a methods paragraph.
In a transit-access analysis, a methods paragraph that names r5py, the number of tract centroids, the store universe, and a Tuesday morning window lets the agent write the routing code directly. Work that once took most of a day moves into well under an hour. The choice of window, metric, and robustness checks stays with us.
Data cleaning
Outcome: analysis-ready data with measured error exposed.
In a food-security project, a Google Places query returned thousands of "grocery store" locations and more than a quarter were not grocery stores. A classifier trained on a small labeled sample climbed into the mid-90s on accuracy. The remaining error rate stayed visible. The data-cleaning step formalizes this pattern. The agent helps build and apply cleaning rules and classifiers. We decide what error rate is acceptable and how to report it.
Quality assurance
Outcome: an estimate that has been tested before the causal sentence is written.
In a SNAP and bank-closure analysis, a two-way fixed-effects estimate passed a parallel-trends test and still failed under deeper checks. A placebo assignment, a sensitivity bound, and a specification with different trends all pointed to fragility. The quality-assurance step turns this into routine practice. The agent helps script and run the diagnostics. We decide whether an estimate is strong enough to report.
Documentation
Outcome: a reader reaches the headline number with no question left about how.
In the SNAP Broad-Based Categorical Eligibility replication, the headline effect and its standard errors, sample size, and weighting all trace back to a public code package. The documentation step closes the loop so that another researcher can run the same pipeline and see the same numbers. The agent assists with structure, cross-references, and alignment between text and code. We remain responsible for accuracy.
Why this comes from Too Early To Say
Each step in this sequence is grounded in analyses that Too Early To Say has already published with open code and data. The methodology section of the site documents the full workflows. The Medicaid data landscape article shows the cost of skipping the literature scan. The methods-paragraph-to-code piece shows how a specific paragraph becomes a working pipeline. The grocery-store classifier article shows how to clean a field that is too large to eyeball and report the residual error. The SNAP difference-in-differences work shows what it means to attack an estimate before trusting it.
This sequence organizes those lessons into one path. It is an entry point for applied researchers who want an AI coding agent in the loop and still want every number to stand up when someone else reruns the script.