AI for Applied Researchers · Step 1 of 5

Literature review

This step is the pattern we use for any replication or extension. It produces a short scoping note: the state-of-the-art paper, its headline benchmarks with outcome scales, and a methods-to-source map we can trust.

The problem this step solves

Before we write any code for a replication, we need a number to aim at. Which paper is the current standard for this question, and what did it find? That published estimate is our reference point. When we run our own version, the result should land near it. Without that reference, our code can still produce a coefficient, but nothing tells us whether it is right.

For Supplemental Nutrition Assistance Program (SNAP) Broad-Based Categorical Eligibility (BBCE), the current standard estimate is +5.9 percent on log SNAP per capita under two-way fixed effects, with a heterogeneity-robust estimate of +15.3 percent on the same outcome.1 That roughly twofold gap is the motivation for the replication. The framing could not exist without first reading the paper closely enough to pull both figures and the distance between them.

When to use this step, and when not to

This step earns its place whenever we are replicating, extending, or benchmarking against published work. The signal is simple. A literature already has a recognized lead estimate we want to land near or improve on.

We can skip it only when the work is genuinely first of its kind with no comparable prior estimate, or when the task is descriptive data engineering with no causal target. For the BBCE work, skipping was not an option. The published gap between the two estimators was the reason the replication existed.

Inputs required

  • A precisely stated research question and outcome variable. For BBCE, the effect of BBCE adoption on SNAP take-up and on log SNAP per capita.
  • Access to the candidate papers, ideally full text or at least abstracts with reported point estimates.
  • A short list of the methods we expect to touch so AI extraction has a target. For BBCE, that included two-way fixed effects, the Goodman-Bacon decomposition, Callaway and Sant'Anna, pre-trend testing, and clustered standard errors.

The AI-assisted move

We hand the AI the research question and the candidate papers. The specification is precise because the agent is only useful when the specification is precise. We ask it to do three concrete things and nothing broader.

First, name the single state-of-the-art evaluation and quote its headline estimate or estimates verbatim, with the outcome scale attached. Second, list every estimator and diagnostic the paper relies on and cite the original methods paper for each one. Third, surface the reported numbers we will later compare our own run against.

For BBCE, that produced a compact map. The benchmark numbers were +5.9 percent for the two-way fixed-effects estimate on log SNAP per capita and +15.3 percent for the heterogeneity-robust estimate on the same scale, both from the lead paper.1 Two secondary figures also mattered. About 11.5 percent of the BBCE participation increase came from extending eligibility above 130 percent of the federal poverty guideline, and the forbidden-comparison weight share for BBCE was around 0.28. The methods traced to specific sources. Staggered-adoption concerns trace to the decomposition source2, the heterogeneity-robust estimator to its origin paper3, pre-test caution to the pre-trend critique4, few-clusters concerns to the inference correction5, and the BBCE adoption count of 41 states to the take-up study.6

Those numbers and citations travel into the rest of the project. The +5.9 percent becomes the benchmark our own two-way fixed-effects run has to land near. The 11.5 percent eligibility-expansion figure explains why a take-up-rate estimate and a log-per-capita estimate differ. The 0.28 forbidden-share benchmark lets us flag that a shorter twelve-year panel producing a 0.512 share is much more fragile. The literature review is not background reading. It sets every comparison the rest of the work will make.

Copy-paste prompt

Here is the prompt we run at the start of a replication. The point is to force verbatim extraction and per-method citation, not summary.

You are helping me scope a replication in applied microeconometrics.

RESEARCH QUESTION:
"What is the effect of [POLICY] on [OUTCOME]?"
(My example: the effect of SNAP Broad-Based Categorical Eligibility
adoption on SNAP take-up and on log SNAP per capita.)

CANDIDATE PAPERS (full text or abstracts pasted below):
[PASTE PAPERS OR ABSTRACTS]

Do exactly three things. Do not summarize broadly.

1. STATE OF THE ART
   - Name the single paper that defines the current standard
     estimate for this question.
   - Quote its headline point estimate(s) verbatim, and state the
     outcome scale for each (e.g. "log per capita", "percentage
     points", "take-up rate").
   - If it reports more than one estimator, give each estimate
     and label which estimator produced it.

2. METHODS-TO-SOURCE MAP
   - List every estimator and every diagnostic the paper relies on.
   - For each one, cite the original methods paper it traces to
     (author, year, journal). Do not cite the applied paper as the
     source of a method it merely uses.

3. BENCHMARK NUMBERS
   - List every reported number I could later compare my own
     replication against (point estimates, decomposition weight
     shares, channel decompositions, sample sizes).
   - For each number, give the exact figure and one sentence on
     what it measures.

Constraints:
- Quote numbers exactly as reported. Do not round or infer.
- If a number or citation is not in the text I gave you, write
  "NOT IN PROVIDED TEXT". Do not supply it from memory.
- Flag any estimate where the outcome scale is ambiguous.

Failure check and validation

AI extraction fails in one specific, checkable way. It reports a number or a citation that is not in the paper. So we validate before trusting anything downstream.

Open the lead paper. For each quoted estimate, confirm that the figure and the outcome scale match the text. For BBCE, that meant confirming that +5.9 percent is the two-way fixed-effects estimate on log per capita and +15.3 percent is the Callaway and Sant'Anna estimate on the same scale, not the reverse and not a different outcome.

A concrete pass requires four confirmations.

  • Each benchmark number appears verbatim in the source paper at the stated outcome scale.
  • Each method citation points to the original methods paper, not to the applied paper that uses the method.
  • Any number the AI could not find is marked "NOT IN PROVIDED TEXT," not filled in from memory.
  • The outcome scale is unambiguous for every estimate that will become a benchmark.

If a benchmark fails to appear in the source, we drop it. A fabricated benchmark is worse than none. It hands the later code a false target to match.

Deliverable

The deliverable is a short scoping note we can paste into the project. It lists the state-of-the-art paper, its verbatim benchmark numbers with outcome scales, and a methods-to-source map that cites the origin paper for each estimator and diagnostic.

For the BBCE replication, this note carried the +5.9 percent and +15.3 percent benchmarks, the 11.5 percent eligibility-expansion channel, the 0.28 forbidden-share benchmark, and the full citation trail behind every estimator and diagnostic.1, 2, 3, 4, 5, 6 Every later step in the project read from this note.

Provenance from our work

This step is the scan that scoped our published SNAP BBCE replication. It pulled the lead paper's +5.9 percent two-way fixed-effects benchmark and its +15.3 percent heterogeneity-robust estimate.1 Our own run later landed at +5.81 percent, inside a point of the target. The benchmark numbers, the citation trail, and the verbatim-extraction check are on the page and in the GitHub repository, open to rerun.

Read the full replication: When the parallel-trends test fails on one lead, what is left. Code and data are on GitHub at dphdame/tooearlytosay-analysis.

References

  1. Wang, X., Valizadeh, P., Nayga, R. M., Bryant, H. L., & Fischer, B. L. (2026). Broad-based categorical eligibility policy and SNAP participation. Journal of Policy Analysis and Management, 45(1), e70063.
  2. Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2), 254-277.
  3. Callaway, B., & Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200-230.
  4. Roth, J. (2022). Pretest with caution: Event-study estimates after testing for parallel trends. American Economic Review: Insights, 4(3), 305-322.
  5. Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How much should we trust differences-in-differences estimates? The Quarterly Journal of Economics, 119(1), 249-275.
  6. Ganong, P., & Liebman, J. B. (2018). The decline, rebound, and further rise in SNAP enrollment: Disentangling business cycle fluctuations and policy changes. American Economic Journal: Economic Policy, 10(4), 153-176.