Building a Literature Surveillance Skill

Staying current with academic literature is one of those tasks that is simple in theory and tedious in practice. Open SSRN. Search. Scroll. Open NBER. Search. Scroll. Open Google Scholar. Search. Scroll. Note anything interesting. Repeat next week. The searches take maybe 30 minutes when done properly, which means they often get abbreviated or skipped entirely when deadlines loom.

Bird's eye view illustration of a researcher at a desk surrounded by chaos: multiple browser windows showing SSRN, NBER, and Google Scholar interfaces, sticky notes scattered around, notebook with handwritten notes, clock on wall showing time passing. — The manual literature search: multiple sources, scattered notes, cognitive overload.

This case study walks through building a Claude Code skill called /ca-lit that automates literature surveillance across three sources: SSRN, NBER, and Google Scholar. The goal is turning a manual weekly chore into a single command.

What We Want

The basic use case: search for recent papers on a topic across multiple academic sources, deduplicate the results, and present them in a readable format. Optionally, save to a file for later reference.

So the command might look like:

/ca-lit "Medicaid fraud"

And the output would show papers from each source, grouped and deduplicated.

Same researcher now calm and focused with a single terminal window open. Three source icons flow into a funnel that outputs a single organized document. Clean desk, clock showing minimal time elapsed. — The automated workflow: one command, consolidated results, time saved.

The Three Sources

Each source has different strengths and access patterns.

SSRN (Social Science Research Network)

SSRN hosts working papers and preprints. Papers appear here before journal publication, sometimes years before. For health economics research, this is where early-stage work shows up.

The access pattern: SSRN has a search interface at ssrn.com/search. We can navigate there, enter search terms, and extract results from the page.

NBER (National Bureau of Economic Research)

NBER publishes economics working papers, including a dedicated Health Economics program. These are typically polished drafts from established researchers, often appearing 6-18 months before journal publication.

The access pattern: NBER has an API at nber.org/api/v1/working_papers/search. We can query directly without scraping.

Google Scholar

Scholar indexes published papers with a lag. It catches work that has made it through peer review and into journals.

The access pattern: Scholar is notoriously hostile to automation. It requires a visible browser window and frequently presents CAPTCHAs. Any automated access needs to be slow and careful.

The Skill Structure

A Claude Code skill lives in ~/.claude/skills/{skill-name}/SKILL.md. The markdown file defines the trigger, purpose, and workflow.

Here is the structure for /ca-lit:

~/.claude/skills/ca-lit/
└── SKILL.md

The skill file specifies:

Trigger: What command invokes the skill (/ca-lit)
Purpose: One-line description
Options: What flags modify behavior
Workflow: Step-by-step phases
Output: Where results go

Walking Through the Options

Let's look at each option and what it does.

Basic Search

/ca-lit "Medicaid fraud"

This searches all three sources for papers matching "Medicaid fraud" in the title or abstract. Results appear in the console, grouped by source.

The workflow:

Query NBER API for matching papers
Navigate to SSRN search, enter terms, extract results
Navigate to Google Scholar, enter terms, extract results
Deduplicate across sources (same paper may appear in multiple places)
Display results

Single Source

/ca-lit --source nber "health economics"

The --source flag limits the search to one platform. Options are nber, ssrn, or scholar.

Why use this? Google Scholar is slow and finicky. If we just want to check NBER for new working papers, there is no reason to wait for Scholar to load. Also useful when Scholar is blocking automated access entirely.

Date Filter

/ca-lit --since 2025-01-01 "county health spending"

The --since flag filters to papers published after a date. Useful for checking what has appeared since the last search rather than getting all-time results.

Digest Output

/ca-lit --digest "public health funding"

The --digest flag writes results to a markdown file instead of just console output. The digest format includes more detail than console output: full abstracts, direct URLs, and a summary section. Useful for creating a record of what was found.

Save to Corpus

/ca-lit --save "Medi-Cal enrollment"

The --save flag appends results to a JSON corpus file. This builds up a searchable database of papers found over time. The corpus can feed into other skills for deeper analysis later.

Combining Options

Options combine naturally:

/ca-lit --source nber --since 2025-01-01 --digest "health economics"

This searches only NBER, only papers since January 1, and writes results to a digest file.

The Google Scholar Problem

Scholar deserves special mention because it actively fights automation. The skill handles this by:

Headed mode: Running a visible browser window rather than headless. Scholar detects headless browsers.
Delays: Waiting 60+ seconds between requests. Faster access triggers blocking.
Result limits: Only pulling 20 results per search. Pagination invites detection.
Graceful degradation: If Scholar fails, the skill continues with SSRN and NBER only.

Sometimes Scholar will present a CAPTCHA. When this happens, we solve it manually and the skill continues. This is annoying but rare if the delays are respected.

If Scholar consistently fails, the --source flag lets us skip it entirely:

/ca-lit --source nber "health economics"
/ca-lit --source ssrn "health economics"

NBER and SSRN together catch most working papers before they reach Scholar anyway.

Deduplication

The same paper often appears on multiple platforms. An NBER working paper might also be posted on SSRN and indexed by Scholar. Without deduplication, results would show the same paper three times.

The skill deduplicates based on:

Title similarity: Fuzzy matching handles minor variations ("The Effect of..." vs "Effect of...")
Author + year: If authors and publication year match, likely the same paper

After deduplication, the output shows unique papers only, with a note about which sources had it.

Example Session

Here is what an actual invocation looks like:

User: /ca-lit "Medicaid fraud" --digest

Claude: Searching for "Medicaid fraud" across SSRN, NBER, Google Scholar...

[NBER] Querying API...
  Found: 3 papers

[SSRN] Searching...
  Found: 7 papers

[Google Scholar] Searching (headed mode)...
  Found: 15 papers

Deduplicating... 22 unique papers

## Summary
- NBER: 3 working papers
- SSRN: 7 preprints
- Google Scholar: 12 published papers

Digest saved: ~/Projects/CAPHE/08_research/literature/weekly_digests/digest_2026-02-01.md

The digest file contains full details for each paper.

What This Replaces

The manual workflow:

Open SSRN, search, scroll through results, note interesting papers
Open NBER, browse recent working papers, note interesting ones
Open Scholar, search, scroll, note papers
Cross-reference notes to remove duplicates
Save somewhere for later

Time: 30-60 minutes when done thoroughly. Often skipped.

The automated workflow:

Run /ca-lit "search terms" --digest
Review the generated digest

Time: 5 minutes to review. Runs consistently.

The value is reliability, not sophistication. The searches happen every week regardless of deadline pressure.

The Skill Reference

For reference, here is the complete skill definition:

Location: ~/.claude/skills/ca-lit/SKILL.md

Trigger: /ca-lit

Option	Purpose
`--source <name>`	Search one source only (nber, ssrn, scholar)
`--since <date>`	Papers since date (YYYY-MM-DD)
`--digest`	Write results to markdown digest file
`--save`	Append results to JSON corpus

Conclusion

Building a literature surveillance skill required defining three things: what sources to search, how to access each one, and where to put results. The implementation handles the tedious parts (navigating sites, extracting metadata, deduplicating) while leaving judgment calls (which papers matter) to humans.

The command /ca-lit "search terms" now does in seconds what used to take half an hour. More importantly, it happens consistently rather than being skipped when time is short.

Suggested Citation

Cholette, V. (2026, February 1). Building a literature surveillance skill. Too Early To Say. https://tooearlytosay.com/research/methodology/literature-surveillance-skill/

Copy citation