GPTBot, ClaudeBot, and PerplexityBot visit your site daily. They see your React SPA and leave with almost nothing useful. Here's how Hypotext detects these crawlers at the edge and serves them clean, citation-friendly markdown — while humans get the full interactive experience.
Who's crawling you (and what they need)
Every major AI platform has its own crawler. They visit frequently but have different parsing capabilities:
| Crawler | Platform | User-Agent | JS Rendering |
|---|---|---|---|
| GPTBot | OpenAI / ChatGPT | GPTBot/1.0 | No |
| ClaudeBot | Anthropic / Claude | ClaudeBot/1.0 | No |
| PerplexityBot | Perplexity AI | PerplexityBot | Limited |
| Google-Extended | Google AI / Gemini | Google-Extended | Partial |
The architecture: edge detection + content switching
Hypotext runs as middleware on Cloudflare Workers. Every request hits the edge first. In under 1ms, we check the User-Agent, decide what to serve, and respond:
Human visitors bypass this entirely and get the normal React app
What "machine-optimized markdown" looks like
It's not just dumping your HTML into a markdown converter. The output is specifically structured for how LLMs consume context during retrieval:
--- title: "Edge AI Deployment Guide" author: "Prasanth SD" date: "2026-05-01" source: "https://example.com/edge-ai-guide" --- # Edge AI Deployment: When and How ## Key Finding At 50K daily inferences, edge deployment saves 62% versus cloud ($847/mo vs $2,230/mo). Below 5K daily, cloud wins by 3×. ## Framework Comparison - ONNX Runtime: 12ms inference, 4-core ARM, INT8 quantized - TensorFlow Lite: 15ms inference, same hardware, FP16 ## Decision Criteria Choose edge if: p95 latency < 50ms required, >10K req/sec, data residency constraints. Choose cloud if: <5K req/sec, model updates >1×/day, multi-region not needed.
Every section is a self-contained, quotable unit. Frontmatter provides attribution. The structure maps directly to how rerankers evaluate relevance.
The detection logic
User-Agent matching is the primary signal, but we also check:
User-Agent string matching
Primary detection. Regex against known AI crawler patterns. Updated weekly.
IP range verification
Cross-reference against published IP ranges from OpenAI, Anthropic. Prevents spoofing.
Accept header analysis
AI crawlers often accept text/plain or text/markdown. Humans request text/html.
Results: before and after
For sites running Hypotext, we see measurable improvements in AI citation rates:
The dual-layer approach means you never compromise. Humans get your beautiful React app. AI crawlers get structured, citation-optimized markdown. Both from the same URL, decided at the edge in under a millisecond.