# Question-First Data Card Template + Tagging Prompts **Source article:** [The Data We Forgot We Had](https://tooearlytosay.com/research/methodology/question-first-data-management/) **Pattern:** Tag data by the questions it can answer, not just by what it contains. Standard data documentation captures what a dataset *is* (source, variables, geography, time period). It omits what questions the dataset can *answer*. Question-first tags add the missing semantic layer so the data can be retrieved by need rather than by memory. Create a data card the moment a dataset enters our project folder, not later. --- ## Data card template Save as a `.yaml` or `.md` file next to the dataset, in `/data/_cards/.yaml`. ```yaml # Data Card: ## Standard Metadata source: acquired_date: location: format: variables: [, , ] geography: time_period: unit_of_analysis: ## Question-Based Tags questions_answerable: - "" - "" - "" - "" constructs_measured: - - - linkable_to: - (join on ) - (join on ) - potential_analyses: - - - ## Notes acquisition_context: "" known_limitations: "" ``` --- ## Why four dimensions, not just metadata | Dimension | What it captures | Example | |-----------|------------------|---------| | Questions answerable | Research questions this data could help answer | "Real vs nominal income comparison" | | Constructs measured | Conceptual variables, not just column names | Prices, purchasing power, cost of living | | Coverage | Geography, time, unit of analysis | CA metros, 2018–2023, quarterly | | Linkability | What other datasets it joins with | ACS via metro, any income data | The shift is from "what is this data?" to "what can this data answer?" When a new research question arises, the search is for *answers* rather than data sources. Question-based tags align storage with retrieval. --- ## Prompt 1 — Generate tags for a new dataset Use this on intake, the same day the data lands. ``` Here's a dataset I have: - Source: - Variables: - Geography: - Time period: - Unit of analysis: What research questions could this data help answer? What constructs does it measure (conceptually, not column names)? What other public datasets might it link with, and on which keys? Suggest 5 question-based tags I should add to my data inventory. Format your answer as YAML matching the data card schema in /data/_cards/_template.yaml. ``` --- ## Prompt 2 — Query the inventory for a new research question Use this when an intuition surfaces during analysis or revisions. ``` I have a new research question: . Here is my data inventory: . Which datasets might be relevant? Include non-obvious connections that go beyond keyword matching. For each candidate, tell me which of the "questions_answerable" or "constructs_measured" tags makes it relevant to my new question. ``` --- ## The workflow 1. **On acquisition:** create a data card with standard metadata plus 3–5 question-based tags. (Use Prompt 1.) 2. **AI enrichment:** ask an assistant to suggest additional questions the data could answer. 3. **Periodic review:** revisit cards when starting new projects. 4. **Query on need:** when a new research question arises, search the inventory. (Use Prompt 2.) The system works through collaboration: human maintains the inventory, the assistant enriches the tags and queries the inventory on demand. --- ## How to start Tag three recent datasets, not the whole archive. 1. Pick the three most recently acquired datasets in our project folders. 2. Create a data card for each (standard metadata, 3–5 questions, linkable_to list). 3. Store the cards in a searchable location (folder, notes app, project context file). 4. Run Prompt 2 against the new three-card inventory the next time a research question surfaces. The move is from "I forgot I had this" to "what in the inventory speaks to X?" Serendipity becomes systematic. --- *Template extracted from work published on [Too Early To Say](https://tooearlytosay.com).* *Licensed MIT. Victoria Cholette, 2026.*