Code generation | AI for Applied Researchers

The problem this step solves

A traditional methods section gives a procedure that someone has to translate into code by hand. That translation asks for the programming language, the tools, the file paths, and the hidden assumptions the paragraph never states. It is slow, and the cost of redoing it discourages testing alternatives.

The linked article reports a transit-routing example with roughly 10.5 hours by hand and 0.7 hours with an agent. No matching public run record reproduces that timing comparison, so it illustrates the specified workflow rather than a verified benchmark.

When to use this step, and when not to

This step works when the procedure is well specified. The paragraph names the tool, identifies the data sources, states the parameters, defines the processing logic, and implies the output.

Name r5py rather than "a routing engine." State 408 origins and 4,847 destinations rather than "stores." Fix a Tuesday 9-11 AM departure window. With that level of detail, the agent can implement the procedure directly. This is the principle behind every step in the series. A precise specification is what makes the agent useful.

We hold off when the description is vague. A paragraph that says "we assessed food access using standard routing methods and publicly available transit data" forces the agent to guess which routing tool, which transit agency, which origin points, which destinations, and which time of day. Each guess changes the results. Vague methodology produces code that runs but may not match the research intent.

The fix is not to ask the agent to be creative. The fix is to tighten the specification, which is the work of the literature-review step.

Inputs required

Before handing the procedure to the agent, we assemble:

A methods paragraph with implementation-level detail. It names the specific tool, identifies every data source, states the parameters, defines the processing logic, and implies the output.
The data files the procedure reads, with their paths. In the transit example these were a validated store file and a census tract centroid file.
A decision about external inputs the agent cannot produce by itself. In our run, the OpenStreetMap (OSM) network and the Santa Clara Valley Transportation Authority (VTA) General Transit Feed Specification (GTFS) feed did not exist locally. The agent paused to ask whether to download them.

The AI-assisted move

We point the agent at the methods paragraph and the data, then let it work within the specification. In the transit-access run, the agent:

Read the paragraph and identified the required inputs: OSM network, GTFS feed, centroids, and stores.
Checked the project directory, found the OSM and GTFS files missing, and asked before downloading them.
The article describes a script that pulls the Bay Area OSM extract and VTA GTFS feed, builds a transport network, and applies the stated routing specification to 408 centroids and 4,847 stores. No matching public script or run record currently reproduces it.
The article reports a 408-row output table; no saved public output currently derives it.

The article reports roughly 45 minutes for the network build, but no public run record verifies that duration. The durable pattern is to treat the methods paragraph as a specification and require the agent to ask when an input is absent.

Copy-paste prompt

Here is a runnable prompt for this step. Fill the bracketed parts with the methods paragraph and file paths, then paste it into Claude Code from the project root.

Implement the procedure described in the methods paragraph below.
Treat the paragraph as the specification. Do not improvise design
choices it does not state.

METHODS PARAGRAPH:
"""
[Paste your implementation-level methods paragraph here. It must
name the specific tool, identify every data source, state all
parameters, define the processing logic, and imply the output.]
"""

DATA FILES (already in this project):
- [path/to/input_one.csv]  -> [one line on what it contains]
- [path/to/output.csv]     -> [where results should be written]

RULES:
1. Before writing code, list the inputs the paragraph requires and
   check whether each exists in the project. If any required input
   is missing, STOP and ask me whether to download it or point you
   to it. Do not invent a substitute.
2. Where the paragraph does not specify a design choice (time window,
   origin definition, distance metric), do not guess. Ask me.
3. After the script runs, report the output shape (row and column
   counts) and confirm it matches what the paragraph implies. If
   the paragraph implies N rows, state whether you got N.
4. Print 3 to 5 sample output rows so I can spot-check them.

Failure check and validation

An agent will produce numbers whether or not they are right. We validate against the specification before trusting any of them.

For the transit-time run, validation required four checks.

Output shape matches the specification. The paragraph implied a 408-row table of minimum transit times. If the script returns a different row count, the run is wrong even if it completed without error.
Extremes match an independent source. Take the longest transit times the script reports and compare them to Google Maps for the same origins and destinations. If a tract shows 67 minutes, confirm that is real and not a routing artifact.
Design parameters were honored. Verify that routing used the Tuesday 9-11 AM window and the stated components such as walking, waiting, riding, transferring, and walking from the final stop, not a default the agent substituted without flagging it.
No unflagged substitution of inputs. If a required input was missing and the agent did not pause to ask, treat the output as suspect and trace what it used instead.

The validation strategy is ours. We can ask the agent to run specific checks, for example "spot-check the five longest transit times against Google Maps," but deciding what counts as correct stays with us.

Deliverable

The deliverable is a working script, result file, and validation record. The linked transit example specifies a 408-row output, but those project artifacts are not currently public or reproduced from the methods paragraph alone.

Provenance from our work

The linked article, Methods-to-Code with AI: An Article-Reported Workflow, describes a 408-tract by 4,847-store specification and a 0.7-hour versus 10.5-hour comparison. Those project-specific artifacts and timings are not publicly reproduced. The article specifies the intended workflow; it does not make the result reproducible from the paragraph alone.