Reading Our Analysis Files: How Claude Sees Our Research Code

Prerequisites

Before reading this article, we should understand:

Your First Session: What Claude Code Is and Isn't - The basics of launching Claude Code and the prompt-response cycle
Why It Forgot Everything: Understanding Context - How the context window works and why sessions start fresh

If we have completed those articles, we are ready to learn how Claude explores our analysis files.

What We Will Learn

Claude has special commands called "tools" that let it interact with files on our computer. Think of tools as capabilities Claude can use when we give it permission. We will explore three essential tools:

Read (look at a file) - Opens and displays the contents of a specific file
Glob (find files by pattern) - Searches for files matching a naming pattern, like finding all .do files
Grep (search inside files) - Looks for specific text across multiple files

Understanding these tools helps us ask better questions and guide Claude more efficiently through our analysis scripts and data pipelines.

When we ask Claude Code about our research code, something interesting happens: it lacks any prior knowledge of our files. Claude has to read files, search for patterns, and build understanding just like we do when reviewing a collaborator's analysis.

This is easy to forget. We ask "How does this script construct the food insecurity index?" and receive a detailed explanation. It feels like Claude already knew. But behind that response, Claude read specific do-files, searched for variable definitions, and synthesized what it found.

Understanding how this exploration works helps us ask better questions and waste less context.

Here is the first thing to understand: Claude has no automatic knowledge of our project folder.

When we start a session, Claude knows nothing about our analysis. It lacks awareness of what scripts exist, whether we use Stata or R, and how our analysis is organized. Every piece of information must be explicitly loaded into the conversation.

This matters because each file we load consumes context window space. If we start with "Read my entire project and explain all the analysis scripts," we might burn through half our context before doing any actual work. The files get loaded, the explanations get generated, and suddenly we have less room for the task that actually matters.

The better approach: guide what Claude reads. If we know which do-file matters, we say so. If we need to search, we describe what we are looking for. Efficient exploration means productive sessions.

What Are File Paths?

Before we dive into the tools, let us quickly cover file paths. A file path is simply the address of a file on our computer, showing its exact location in the folder structure.

For example:

analysis/02_regressions.do means: inside the analysis folder, find the file called 02_regressions.do
code/01_data_cleaning.py means: inside the code folder, find the file called 01_data_cleaning.py

Paths use forward slashes (/) to separate folders. Longer paths just mean more nested folders:

analysis/robustness/04_sensitivity.do is in a robustness folder inside the analysis folder

When we tell Claude to read a file, we give it this address so it knows exactly where to look.

The Read Tool (Looking at a File)

The most straightforward way Claude examines code is the Read tool. It does exactly what the name suggests: opens a file and loads its contents into the conversation.

When Claude reads a file, it sees:

Line numbers - Every line is numbered, making it easy to reference specific locations
Full content - The actual text of the file, syntax and all
File type - Claude infers the language from the file extension (.do for Stata, .py for Python)

We can ask Claude to read a file directly:

"Read analysis/02_regressions.do and explain the main specification"

Claude will open the do-file, find the regression command, and explain what it does. Simple and direct.

For large files, we can request partial reads. Instead of loading a 2,000-line cleaning script, we might say:

"Read lines 150-200 of code/01_data_cleaning.py"

This loads just the relevant section, preserving context for other work.

When to use Read:

We know exactly which script matters
We want to examine a specific regression specification
We need Claude to understand how a variable is constructed

The Read tool is precise. When we know where to look, it gets us there efficiently.

The Glob Tool (Finding Files by Pattern)

Sometimes we know what kind of file we need but not its exact location. The Glob tool helps here: it finds files by pattern matching.

This works like searching for files in Finder or Windows Explorer using wildcards. If we have ever searched for *.dta to find all our Stata data files, we already understand the concept.

Glob patterns use the asterisk (*) as a wildcard that matches anything:

*.do - All Stata do-files in the current folder
**/*.do - All Stata do-files anywhere in the project (the ** means "search all subfolders too")
analysis/**/*.py - All Python scripts under the analysis folder, including subfolders
*.dta - All Stata data files in the current folder
code/**/clean_*.do - All scripts starting with clean_ anywhere under the code folder

When Claude runs a Glob search, it returns file paths, not contents. This is an important distinction. Glob tells us what files exist. Reading those files is a separate step.

We might ask:

"Find all do-files in the analysis directory"

Claude returns a list of file locations:

analysis/01_sample_construction.do
analysis/02_summary_stats.do
analysis/03_main_regressions.do
analysis/robustness/04_sensitivity.do

Now we know what exists. We can choose which files to read based on what we actually need.

When to use Glob:

We know the file type but not exact locations
We want to see what scripts exist in our project
We are exploring an unfamiliar research project from a collaborator

Glob is discovery. It answers "What files match this pattern?" without loading any content.

The Grep Tool (Searching Inside Files)

The third exploration tool searches file contents. When we need to find where something is used, defined, or mentioned, Grep is the answer.

Think of Grep as Ctrl+F (or Cmd+F) for an entire project folder. Instead of searching within one open file, it searches across all files at once.

Grep searches for text patterns across files. It returns matches with their file locations and line numbers.

We might ask:

"Search for all uses of food_insecurity_score"

Claude runs the search and returns matches:

code/01_create_index.do:47: gen food_insecurity_score = (q1 + q2 + q3) / 3
analysis/02_regressions.do:89: reg food_insecurity_score treatment i.county, cluster(county)
analysis/03_heterogeneity.do:23: by income_tercile: sum food_insecurity_score
output/tables/summary_stats.do:12: tabstat food_insecurity_score, by(treatment) stat(mean sd n)

Each result shows: file path, line number, and the matching line of code. Now we know everywhere this variable appears. We can follow references, understand how the index flows through the analysis, and track down issues.

Grep can filter by file type:

"Search for reghdfe commands in all do-files"

This combines content searching with file filtering for precise results.

When to use Grep:

We are looking for where a variable is defined
We want to find all regressions using a specific dataset
We need to track down how a sample restriction is applied

Grep answers "Where does this appear?" across the entire project folder.

Combining the Tools

The real power comes from combining these tools. A typical exploration might flow like this:

Glob to find candidate files (what do-files exist?)
Grep to narrow down to relevant matches (which files mention our variable?)
Read to examine the important ones in detail (what exactly does this code do?)

Let us say we want to understand how the SNAP participation variable is constructed in an unfamiliar project. We might start:

"Find all files that reference snap_participation, then show me where it's first defined"

Claude might:

Grep for snap_participation to find files that use this variable
Read the data construction script to understand how it is created
Read the merge script to see which datasets contribute to the variable

The result: a complete picture of the variable's construction, built up through targeted exploration rather than loading everything at once.

Another common pattern:

"Find where the main regression sample is defined, then show me all specifications that use it"

Claude uses Grep to find the sample construction code, Read to understand the restrictions, then Grep again to find regression commands using that sample. Each step builds on the previous one.

Interpreting Tool Output

When Claude uses these tools, the output follows consistent patterns. Understanding these patterns helps us work more effectively.

Read output shows line numbers at the start of each line. When Claude refers to "line 47," it means the line numbered 47 in the output. We can reference these numbers in follow-up requests: "Explain what happens on lines 45-52."

Glob output returns file paths sorted by modification time, most recent first. This is useful: the files we changed recently often matter most for current work.

Grep output shows file path, line number, and the matching content. Many matches might trigger truncation. If we see truncation, we can ask Claude to narrow the search or focus on specific files.

Sometimes outputs are large enough that Claude summarizes rather than showing everything. This is usually fine, but if we need complete results, we can ask: "Show me all matches, not a summary."

Asking Claude to Find Things

We can skip specifying which tool to use. Claude often chooses appropriately based on what we ask. We can speak naturally about what we want to find.

Finding definitions:

"Find where the treatment_status variable is defined"

Claude will search for the variable assignment, then open the file to show it.

Finding usage:

"Show me all regressions that include county fixed effects"

Claude will search for fixed effect patterns and report where they appear.

Exploring analysis structure:

"What scripts handle the data cleaning?"

Claude might search for files with "clean" in the name, search for merge and reshape commands, or both.

Finding specifications:

"Search for any DiD regressions in the analysis files"

Claude will search for patterns like did, diff-in-diff, or reghdfe with time and group interactions and report locations.

The key is being clear about what we want to find. "Find where X is defined" and "Find everywhere X is used" are different questions that lead to different searches.

The Context Cost of Reading

Here is the practical consideration: every file Read consumes context.

A small utility script might cost a few hundred tokens. A large master do-file might cost several thousand. If we ask Claude to read ten scripts to understand an analysis pipeline, we have used significant context before starting actual work.

This connects back to what we learned about context windows. Efficient exploration means reading what matters, not reading everything.

Strategies that help:

Read partial files when we know the relevant section. "Read lines 100-150 of the long cleaning script" costs much less than reading the whole thing.

Search first, read second. Find which files contain our variable of interest, then read only those files.

Ask Claude to summarize. Instead of reading five related scripts, ask Claude to find and summarize the data construction pipeline across them.

Be specific about scope. "Explain the food_insecurity_score construction" is cheaper than "Explain the entire data cleaning process."

The goal is reading what we need and no more.

Practical Exercises

These exercises help us build intuition for how Claude explores research code:

Explicit read: Ask Claude to read a specific do-file or R script. Notice that Claude lacked knowledge of the contents until it read them. The knowledge came from opening the file, not from prior awareness.
Pattern discovery: Ask Claude to find all .do files or .py files in our analysis folder. Examine the output format and notice the modification time sorting.
Variable tracking: Ask Claude to search for all occurrences of a key variable name. Follow the references to understand how the variable flows from raw data through to final regressions.
Guided exploration: Ask Claude to find where a sample restriction is applied, then where the restricted sample is used. Watch how it combines tools to answer the question.
Context awareness: Read a large master do-file and notice how much context it consumes. Try reading just a line range instead and compare.

What We Have Learned

Claude Code explores our analysis files through three core tools:

Read (look at a file) - Opens and displays file contents for detailed examination
Glob (find files by pattern) - Finds files matching a pattern, like *.do for all Stata files
Grep (search inside files) - Searches for text across all files in our project

These tools combine to enable powerful exploration: finding candidate scripts, narrowing down to relevant variables, and examining regression specifications in detail. Understanding how they work helps us ask better questions and use context efficiently.

This completes the beginner tier. We now understand what Claude Code is, how sessions and context work, and how Claude navigates our files. These fundamentals prepare us for the intermediate tier, where we learn practices that make AI collaboration sustainable over time.

Series Complete

We have now completed the Claude Code Guide - Beginner Tier. We have covered:

Your First Session - What Claude Code is and how to interact with it
Understanding Context - Why sessions start fresh and what that means
Reading Our Analysis Files - How Claude explores files with Read, Glob, and Grep

Continue to Intermediate Tier

Ready for more? The Intermediate Tier addresses the practices that separate productive AI collaboration from frustrating trial-and-error:

The Cold Start Problem - Why the first five minutes matter most
End-of-Session Hygiene - What to capture before context resets
Context Window Budgeting - Treating tokens as a finite resource
The Verification Tax - Every AI output needs checking
Research Phases Need Different Prompts - Matching prompt strategy to task type

The intermediate tier builds on these fundamentals to establish sustainable workflows.

This is Article 3 of 3 in the Claude Code Guide - Beginner Tier. Continue to the Intermediate Tier to learn session management and context discipline.

Suggested Citation

Cholette, V. (2026, February 18). Reading our analysis files: How Claude sees our research code. Too Early To Say. https://tooearlytosay.com/research/methodology/reading-codebase/

Copy citation