Data ready for AI (and ready for humans)
“We’ll just RAG it” is the new “we’ll fix it in post.” Retrieval only amplifies what you already have: maintained datasets, permissions that match reality, and definitions people actually agree on. If teams already argue over the meaning of a field, the AI layer will not resolve that dispute. It will scale it.
Inventory beats ambition
Start with a boring spreadsheet: source system, owner, refresh cadence, known gaps, and legal sensitivity. For each dataset an agent might touch, ask whether a human analyst would trust it for a board slide on Monday morning. If not, automation shouldn’t either—at least not without explicit caveats in the product.
Freshness is a product decision
Stale documentation poisons both humans and models. Readiness means deciding how stale is acceptable per use case: pricing pages might need hourly checks; HR policies might be weekly; archived contracts might be deliberately frozen with visible timestamps. The AI layer inherits those contracts—you’re just forced to encode them.
Access patterns matter as much as storage
Data sitting in a warehouse unused by frontline teams is a red flag. If only engineers can query it, your “copilot” becomes an engineering project forever. Readiness often means surfacing the same views people already use—then layering intelligence where duplicates and search pain actually live.
Privacy and minimisation
Ready data for AI is often less data—scoped fields, redacted identifiers, and retrieval scoped by tenant. The readiness conversation should include what must never enter a prompt block, not only what can.
For founders
At startup scale, “data readiness” might mean one Postgres schema you actually control and a content pipeline you update when marketing ships. That’s fine—own it honestly. Small systems with clear ownership beat sprawling piles connected by brittle sync jobs.