The problem this step solves
An analysis is only as credible as a reader's ability to retrace it. Which raw data. Which specification. Which diagnostics. Which number traces to which line of code. When that trail is missing, a reader cannot tell a defensible estimate from an artifact.
In the Broad-Based Categorical Eligibility (BBCE) replication, the core design was a two-way fixed-effects difference-in-differences estimate built from an Integrated Public Use Microdata Series (IPUMS-USA) extract of the American Community Survey 2005-2016, aggregated to 612 state-year observations across 51 states, with code and data posted publicly. A headline estimate of +1.37 percentage points on Supplemental Nutrition Assistance Program (SNAP) take-up means little to a reader who cannot see how the panel was built, which states were ever treated, and how standard errors were clustered. Documentation supplies that trail.
When to use this step, and when not to
The right moment is once the analysis has stabilized. The specification is chosen, the diagnostics have run, and the numbers in the draft match the numbers the code returns. Documentation written before the estimate settles documents a moving target and goes stale on the next rerun.
We defer this step while the work is still exploratory and the specification is changing daily. There is no value in a polished data dictionary for a panel we may rebuild tomorrow. The signal that documentation is due is simple. We are about to hand the result to a co-author, a referee, or our future self, and we want that reader to reproduce it without asking a single question.
Inputs required
Before drafting, we gather:
- The raw data source and the exact extract. For BBCE, the IPUMS-USA American Community Survey 2005-2016 pull and the screen that defines eligibility, such as households at or below 130 percent of the federal poverty guideline.
- The construction steps from raw rows to the analysis panel. For BBCE, aggregation to state-year cells, the 612-observation panel, and the split into 41 ever-treated and 10 never-treated states.
- The specification. A two-way fixed-effects single-event design with state and year fixed effects, SNAP take-up as the headline outcome and log SNAP per capita as a robustness outcome, standard errors clustered at the state level.
- The diagnostics already run. The pre-period parallel-trends F-test, the placebo test, the leave-one-out sweep across all 41 ever-treated states, and the Goodman-Bacon decomposition.
- The code repository that produced the numbers, so every documented value can be traced back to its source.
The AI-assisted move
We hand the analysis code and the raw-data description to the model. Its job is to draft the reproducibility documentation: a data-provenance section, a specification section, a diagnostics section, and a numbered replication path. The model is fast at turning a working script into prose that names each input, transformation, and output.
The model does not get the last word on any number. We treat its draft as a starting point and then reconcile every quantitative claim against the code. For BBCE, that means checking the +1.37 percentage-point estimate, the 612-observation panel count, the 41-and-10 state split, the leave-one-out range, the parallel-trends F-test at p = 0.041, and the Goodman-Bacon forbidden-comparison weight share of 0.512. Documentation inherits the same standard as the analysis. It states what the code returns, including the awkward parts.
Copy-paste prompt
We paste the analysis script or the key functions and the raw-data description in place of the bracketed blocks, then run this prompt:
You are helping document an empirical analysis so a stranger can
reproduce it from raw data without contacting the author.
Here is the analysis code:
[PASTE SCRIPT OR KEY FUNCTIONS]
Here is the raw data source and how the analysis panel is built:
[PASTE: data source, extract details, screens, aggregation steps,
unit of observation, sample window, treated/control split]
Write reproducibility documentation with these sections:
1. DATA PROVENANCE
- Name the exact raw source and extract.
- List every screen and transformation from raw rows to the
analysis panel, in order.
- State the unit of observation and the final sample counts.
2. SPECIFICATION
- State the estimator, the fixed effects, the outcome variable(s),
and the standard-error clustering.
- Define every variable, including how the outcome is constructed.
3. DIAGNOSTICS REPORTED
- For each diagnostic in the code (e.g. parallel-trends test,
placebo, leave-one-out, decomposition), state what it tests
and the value it returns. Do not soften a failing test.
4. REPLICATION PATH
- A numbered list of commands a stranger runs to go from a clean
checkout of the repository to the headline numbers.
Rules:
- Pull every number, count, and variable name from the code or the
data description I gave you. Do not invent values.
- For any number you cannot trace to what I pasted, write
[VERIFY: ] instead of guessing.
- Be specific. "Standard errors are clustered" is not enough; say
on what.
Failure check and validation
The draft is not done until two checks pass.
First is the traceability check. We read the documentation against the code and confirm that every number, count, and variable name appears in the script or in its outputs. Any value the model could not source should arrive marked "[VERIFY: ...]." If the model produced a clean-looking number with no basis in the code, that is a fabrication and the documentation fails.
A concrete test is simple. Pick three numbers from the draft. For BBCE, the +1.37 percentage-point estimate, the 612 observations, and the leave-one-out range. Search the repository for each. If a number is not in the code or in the logged outputs, it does not belong in the documentation.
Second is the stranger-rerun check. We follow our own replication path on a clean checkout, running each numbered command in order. If a step assumes a file, a package, or an environment variable that the documentation never mentioned, the path is broken and we add the missing step. The documentation passes when a clean checkout reaches the headline numbers with no undocumented move.
Deliverable
The deliverable is a reproducibility document that travels with the analysis. It includes a data-provenance section naming the raw source and every transformation, a specification section defining the estimator and every variable, a diagnostics section reporting each test and its value, and a numbered replication path from a clean checkout to the headline numbers.
For the BBCE case, that document sits alongside the code and data in the public repository, so a reader can rebuild the 612-observation panel and recover the +1.37 percentage-point estimate without contacting us.
Provenance from our work
This standard comes from the SNAP BBCE replication, "When the parallel-trends test fails on one lead, what is left." The documentation traces the +1.37 percentage-point estimate, the 612-observation panel, the 41-and-10 state split, and the leave-one-out range back to the code. It reports the awkward parts plainly, including the parallel-trends F-test failing at p = 0.041 and the 0.512 forbidden-comparison weight share. The package sits on a public repository, so the result is reproducible by a stranger.