Agent Readiness: The Attribute Nobody Has

I've spent most of my career inside B2B data products — databases with hundreds of millions of company records and thousands of attributes per record. Firmographics, technographics, intent signals, org charts. The works.

None of them have this one.

Agent Readiness scores how ready a business's website is for AI agent consumption. Can an agent read it? Is there structured data? An API? A manifest that says "hey, I'm here, here's what I can do"? It turns out most businesses score terribly — and the ones that don't are going to have a real advantage as the web shifts from human-first to human-and-agent.

The gap in every database

Traditional B2B databases are built for human researchers and sales reps. They tell you what a company does, who works there, what technology they use. That's valuable. But they don't tell you anything about whether that company's digital presence is actually consumable by the AI agents that are increasingly doing the research, the outreach, and the buying.

This matters if you're building AI workflows, selling AI tools, or just trying to figure out which prospects are sophisticated enough to care about what you're selling. A company with a robots.txt, a sitemap.xml, JSON-LD structured data, and an OpenAPI spec is a very different prospect than one with a Squarespace splash page and no meta tags.

And here's the thing — you don't need to run this on 150 million records. You run it on the companies you actually care about. Enrich on demand, score what matters, skip what doesn't. That's the whole point of building attributes that are fast and cheap enough to run in real time.

What we measure

Six categories, scored 0–10. No LLM required — it's all deterministic HTTP checks. Takes 2–5 seconds per domain.

Crawlability

10 pts

Can agents discover the site? robots.txt, sitemap.xml, AI bot access rules.

Machine Readability

10 pts

Is the content structured for machines? JSON-LD, Schema.org, semantic HTML.

API Readiness

10 pts

Can agents query programmatically? .well-known/ai-plugin.json, OpenAPI spec, API docs.

Agentic Commerce

10 pts

Can agents transact? MCP manifests, payment APIs, product feeds, structured pricing.

Content Access

10 pts

Can agents actually read it? SSR, clean text, reasonable page size, no bot blocks.

Agent Signals

10 pts

Does the site explicitly support AI? llms.txt, ai-plugin.json, MCP manifest.

We scanned 2,681 businesses

Every Boise business in our dataset that has a website. Here's where things actually stand.

4.3

Avg Score

out of 10

20%

Grade F

544 businesses

3.7%

Grade A

Only 98 businesses

2,681

Scanned

Boise metro area

Signal adoption

HTML accessible85.5%

robots.txt85.1%

Server-side rendered84.2%

sitemap.xml75.0%

Semantic HTML67.5%

Schema.org57.4%

JSON-LD54.1%

llms.txt21.4%

API docs11.8%

OpenAPI / MCP / ai-plugin~10%

2,681 Boise businesses with websites — full scan, April 2026

What this actually tells you

The basics are covered. Most businesses have robots.txt and server-rendered HTML — that's table stakes and has been for a decade. About half have JSON-LD, mostly because their website builder adds it automatically.

The drop-off starts at the AI-specific signals. Only 21% have llms.txt. Under 12% have anything resembling API documentation. And fewer than 10% have an OpenAPI spec, MCP manifest, or ai-plugin.json.

That bottom tier is where it gets interesting for GTM. If you're selling AI tooling, integration platforms, or agent-based workflows, the 3.7% of businesses that score an A are your early adopters. They've already done the work. The 44% in the C range are reachable but need education. The 20% at F — they're not ready for what you're selling yet.

That's a segmentation you can't get from any existing database.

Enrich what you need, when you need it

The old approach to business data is: collect everything on every company, store it, keep it updated. That works for stable attributes like address, employee count, industry code. It doesn't work for attributes that change constantly or only matter in specific contexts.

Agent readiness changes every time a company updates their website. Running it on 150 million records weekly would be expensive and pointless. But running it on the 500 companies in your pipeline? That takes about 40 minutes and costs nothing.

This is how we think about new attributes in general. You have your base layer — the stuff a big database does well. Then you add context-specific scoring on top, on demand, for the companies you're actually working. Agent readiness is one of those attributes. We've built others — buyer persona, regulatory exposure, revenue model, seasonality — that run on local LLMs at $0 cost.

Agent readiness is different in one important way: it doesn't need a model at all. It's deterministic HTTP checks. That means it's fast, cheap, perfectly reproducible, and you can run it at scale without worrying about inference costs or model drift.

Fixing your own score

We ran it on our own site first and scored a 6 out of 10. Had robots.txt and llms.txt but nothing else. So we added:

01

/sitemap.xml — dynamic, pulls from our database

02

JSON-LD structured data — Schema.org Person + Organization on profile pages

03

/.well-known/ai-plugin.json — tells agents what's available

04

/openapi.json — full API spec so agents know how to call us

05

/mcp.json — MCP manifest with tool definitions

06

OG + meta tags — rich previews and description

Took about 30 minutes. Most of it is boilerplate once you know what to add.

So what

The web is quietly growing a second audience. Humans still browse, but AI agents are increasingly the ones doing research, comparing options, pulling data, and making recommendations. The businesses that are set up for both audiences will get found. The ones that aren't, won't.

It reminds me of the early SEO days — not in a hype-y "you need to optimize for AI!" way, but in a practical one. In 2005, if you didn't have a sitemap and decent meta tags, Google couldn't index you properly. It wasn't complicated, it just wasn't on anyone's radar yet. That's where we are with agent readiness right now. The fixes are straightforward. Most people just haven't thought about it.

We built a scanner. You can run it on any domain. If you're interested in adding agent readiness as an attribute to your own data, reach out.

How we built this

This whole thing runs on a Mac Studio M4 Max sitting under my desk. We call it Stu. 36GB of unified memory, running Ollama with a handful of open-source models — gemma4, llama3.1, gemma2. Total cloud cost for the enrichment pipeline: $0.

The agent readiness scanner itself is just Python making HTTP requests. No model needed. It checks ~15 endpoints per domain and scores what it finds. We ran 2,681 domains in about 7 hours overnight. The scanner runs in parallel with our LLM benchmarks — so while Stu was classifying 3,197 businesses across buyer persona, regulatory exposure, and four other attributes, the laptop was scanning websites for agent signals.

The whole project — defining the attribute, building the scanner, running 2,681 scans, writing this post — happened in one night. My co-founder Jody and I have been building Product Hacker this way for months: pick a problem, ship something that works, measure it against real data. We've shipped 9 apps this way. The B2B data enrichment pipeline is one of the more interesting ones.

The stack

Scanner: Python, urllib, no dependencies

LLM inference: Ollama on Mac Studio M4 Max

Models: gemma4:e2b, gemma4:e4b, llama3.1:8b, gemma2:2b

Dev tool: Claude Code (Max plan)

Web app: Node.js + Express on Railway

Inference cost: $0 — all local

I spend my days working in B2B data at scale. Product Hacker is where I get to experiment with the stuff that's too new or too weird for a large organization to try yet. The ideas cross-pollinate. That's the whole point.

Scan a Domain Our LLM Benchmarks