Serving AI crawlers machine‑optimized markdown at the edge

GPTBot, ClaudeBot, and PerplexityBot visit your site daily. They see your React SPA and leave with almost nothing useful. Here's how Hypotext detects these crawlers at the edge and serves them clean, citation-friendly markdown — while humans get the full interactive experience.

The core problemAI crawlers parse HTML poorly. They need clean, structured text. Your React app renders a beautiful UI for humans but a meaningless bundle of JavaScript for bots. Serving both well requires a dual-layer approach at the edge.

Who's crawling you (and what they need)

Every major AI platform has its own crawler. They visit frequently but have different parsing capabilities:

Crawler	Platform	User-Agent	JS Rendering
GPTBot	OpenAI / ChatGPT	`GPTBot/1.0`	No
ClaudeBot	Anthropic / Claude	`ClaudeBot/1.0`	No
PerplexityBot	Perplexity AI	`PerplexityBot`	Limited
Google-Extended	Google AI / Gemini	`Google-Extended`	Partial

The architecture: edge detection + content switching

Hypotext runs as middleware on Cloudflare Workers. Every request hits the edge first. In under 1ms, we check the User-Agent, decide what to serve, and respond:

🌐

Request arrives

Edge node (< 50ms from user)

→

🔍

User-Agent check

Is it GPTBot, ClaudeBot, etc?

→

📄

Serve markdown

Clean, structured, citation-ready

Human visitors bypass this entirely and get the normal React app

What "machine-optimized markdown" looks like

It's not just dumping your HTML into a markdown converter. The output is specifically structured for how LLMs consume context during retrieval:

Example output for an AI crawler:

---
title: "Edge AI Deployment Guide"
author: "Prasanth SD"
date: "2026-05-01"
source: "https://example.com/edge-ai-guide"
---

# Edge AI Deployment: When and How

## Key Finding
At 50K daily inferences, edge deployment saves 62% versus
cloud ($847/mo vs $2,230/mo). Below 5K daily, cloud wins by 3×.

## Framework Comparison
- ONNX Runtime: 12ms inference, 4-core ARM, INT8 quantized
- TensorFlow Lite: 15ms inference, same hardware, FP16

## Decision Criteria
Choose edge if: p95 latency < 50ms required, >10K req/sec,
data residency constraints.
Choose cloud if: <5K req/sec, model updates >1×/day,
multi-region not needed.

Every section is a self-contained, quotable unit. Frontmatter provides attribution. The structure maps directly to how rerankers evaluate relevance.

The detection logic

User-Agent matching is the primary signal, but we also check:

User-Agent string matching

Primary detection. Regex against known AI crawler patterns. Updated weekly.

IP range verification

Cross-reference against published IP ranges from OpenAI, Anthropic. Prevents spoofing.

Accept header analysis

AI crawlers often accept text/plain or text/markdown. Humans request text/html.

Results: before and after

For sites running Hypotext, we see measurable improvements in AI citation rates:

3.2×

More AI citations

< 1ms

Detection overhead

Impact on human UX

The dual-layer approach means you never compromise. Humans get your beautiful React app. AI crawlers get structured, citation-optimized markdown. Both from the same URL, decided at the edge in under a millisecond.