Your website ranks #1 on Google. Your SEO is impeccable. But when someone asks ChatGPT or Perplexity about your topic, a competitor's website gets cited instead. Why? Because the reranker — not the search index — decides who gets into the LLM's answer.
ChatGPT cites pages from organic position 21+ approximately 90% of the time. Traditional SEO ranking is a poor predictor of AI citation. The rules have changed — and the reranker is the new gatekeeper.
The new search: how AI engines actually work
Perplexity, ChatGPT Search, and Google AI Overviews all follow the same fundamental pipeline. Understanding this pipeline is the key to understanding why your content gets cited — or doesn't.
Here's what most people miss: SEO gets you into the candidate pool (Step 2), but the reranker decides if you make it into the final answer (Step 3). These are entirely different algorithms optimizing for different signals.
Why SEO ranking ≠ AI citation
Traditional search engines rank pages. AI search engines rank passages. This is a fundamental shift that most SEO practitioners haven't internalized yet.
| Traditional SEO | AI Citation (GEO) | |
|---|---|---|
| Unit of ranking | Entire page | Individual passages/claims |
| What matters | Backlinks, domain authority, keywords | Extractability, specificity, verifiability |
| Optimization target | Click-through rate | Citation probability |
| Content style | Engagement, length, multimedia | Concise, factual, self-contained claims |
| Keyword stuffing | Diminishing returns | Actively harmful (-8.2%) |
What the reranker actually looks for
When a cross-encoder reranker processes your page against a user query, it's running full bidirectional attention between query tokens and document tokens. But what features, in practice, make it score one document higher than another?
The reranker doesn't just ask "is this page about the topic?" It asks:
- 1. Does this passage directly answer the query? (not just relate to it)
- 2. Is the claim verifiable? (statistics, citations, named sources)
- 3. Is the information extractable? (standalone sentence vs. context-dependent)
- 4. Is it authoritative? (expert tone, confident language, third-party validation)
- 5. Is it specific? (concrete numbers vs. vague claims)
The GEO research: what actually works
The landmark KDD 2024 paper "GEO: Generative Engine Optimization" by Aggarwal et al. tested 9 content optimization strategies across 10,000 queries. The results are striking:
What each strategy means in practice
"Swiss people love chocolate and consume a lot of it every year."
"With per capita consumption averaging 11-12 kilos, Swiss people rank among the top chocolate lovers (According to The International Chocolate Consumption Research Group [1])"
Why it works: The reranker's cross-attention sees explicit citation markers and weights the passage as more verifiable and trustworthy. The LLM then has a concrete source to cite.
"The Jaguars have never been to the Super Bowl but they did win some division titles."
"It is important to note that The Jaguars have never appeared in the Super Bowl. However, they have achieved an impressive feat by securing 4 divisional titles, a testament to their prowess and determination."
Why it works: Authoritative framing ("it is important to note", "achieved an impressive feat") signals expertise to the cross-encoder. The reranker has learned from training data that expert sources use confident, analytical language.
"Robots have come not to destroy our lives, but to disrupt our work."
"Robots have come not to destroy our lives, but to disrupt our work, with a staggering 70% increase in robotic involvement in the last decade."
Why it works: Concrete numbers give the reranker verifiable anchors. The cross-attention mechanism can match "70%" directly to numerical queries. Statistics make claims extractable and citable.
Lower-ranked websites benefit MORE from GEO than top-ranked ones:
GEO democratizes visibility. You don't need to be #1 in Google to get cited by AI — you need content that the reranker scores highly for the specific query.
How each AI engine selects citations
- • Own crawled search index
- • Retrieves top 5 documents per sub-query
- • Multiple LLM backends (Sonar, Claude, GPT)
- • Strongest preference for recency
- • Inline citations with numbered references
- • Uses Bing index + OAI-SearchBot
- • Prioritizes recent over perfect
- • Cites from position 21+ 90% of the time
- • Heavy weight on freshness signals
- • Sources shown as clickable sidebar links
- • Google's own index + Gemini model
- • "Query fan-out" — parallel sub-searches
- • Shows greater website diversity than SERPs
- • No special requirements beyond indexing
- • Expanding beyond informational queries
The "Lost in the Middle" effect on citations
The reranker's output ordering has a direct, measurable effect on citation probability. Here's why: LLMs have a well-documented bias — they pay disproportionate attention to content at the beginning and end of their context window.
This means the reranker's decision to move your page from position 4 to position 1 isn't a marginal improvement — it's the difference between being cited and being invisible. The GEO paper captures this with "Position-Adjusted Word Count" — visibility decays exponentially by position.
The 6 signals that make rerankers choose your content
Extractability
Can the LLM pull a standalone answer from your content without needing surrounding context?
Specificity & verifiability
Concrete numbers, named experts, measurable results beat vague claims every time.
Source citation within your content
Pages that cite other authoritative sources score dramatically higher (+132.4%).
Freshness
ChatGPT prioritizes recent over perfect. AI engines heavily weight publication date.
Technical accessibility
AI crawlers must be able to read your content. Client-side JS rendering is invisible to them.
Off-page authority signals
Unlinked brand mentions, Wikipedia presence, third-party references all feed the reranker's authority model.
How the reranker "thinks" about your page
Let's trace through exactly what happens when a user asks Perplexity "What are the best tools for AI search optimization?" and your page is in the candidate set:
"best" "tools" "AI" "search" "optimization"[CLS] query [SEP] "Our platform provides AI-powered..." [SEP]"tools" attends to product names and feature lists
"AI search optimization" attends to your domain-specific terminology
Low score → position 5+ → "lost in the middle" → rarely cited
The conversion advantage
When an LLM cites your brand, it's an implicit endorsement. The user arrives pre-sold on your expertise. This is fundamentally different from a blue link on page 1 of Google.
Practical playbook: optimizing for reranker selection
Content-level optimizations
First paragraph should contain a self-contained, citable statement. Don't bury the answer — rerankers score based on the full passage, but LLMs cite from the top.
Every major claim should have a number attached. "67% reduction (Anthropic, 2024)" is infinitely more citable than "significant improvement."
Name papers, authors, organizations. "According to [Author] at [Institution]" triggers the reranker's authority patterns.
Write like a practitioner, not a marketer. "We deployed this across 50 production systems" beats "Our innovative solution."
Freshness is a dominant signal. Monthly updates with current numbers outperform annual comprehensive guides.
Technical optimizations
AI crawlers (GPTBot, ClaudeBot, PerplexityBot) often can't execute JavaScript. If your content is client-rendered, it's invisible.
Detect bot user-agents and serve clean markdown or structured HTML. Remove navigation, ads, sidebars — give them pure content.
Schema.org markup, proper heading hierarchy, article metadata. These help crawlers extract and understand your content structure.
The uncomfortable future
Here's what's coming: AI search is eating traditional search. Google AI Overviews are expanding beyond informational queries. Perplexity's user base is growing exponentially. ChatGPT is becoming many users' first stop for questions.
In this world, being "ranked #1 on Google" means less every month. What matters is whether the reranker — that cross-encoder sitting between retrieval and generation — decides your content is the most citable answer to the user's question.
The bottom line
The reranker is the new gatekeeper. It doesn't care about your domain authority or your backlink profile — not directly. It cares about whether your specific passage, when read alongside the user's query through full bidirectional attention, looks like the most trustworthy, specific, and extractable answer available.
The brands that will dominate AI search are the ones that understand this shift: stop optimizing for page-level ranking signals, and start optimizing for passage-level citability. Add statistics. Cite sources. Write with authority. Publish fresh content. Make it technically accessible to AI crawlers.
In the age of AI search, you don't rank pages. You earn citations.