How rerankers decide which brands get cited in AI search

Your website ranks #1 on Google. Your SEO is impeccable. But when someone asks ChatGPT or Perplexity about your topic, a competitor's website gets cited instead. Why? Because the reranker — not the search index — decides who gets into the LLM's answer.

The uncomfortable truth

ChatGPT cites pages from organic position 21+ approximately 90% of the time. Traditional SEO ranking is a poor predictor of AI citation. The rules have changed — and the reranker is the new gatekeeper.

The new search: how AI engines actually work

Perplexity, ChatGPT Search, and Google AI Overviews all follow the same fundamental pipeline. Understanding this pipeline is the key to understanding why your content gets cited — or doesn't.

🔤

1. Query Processing

User query decomposed into sub-queries. Google uses "query fan-out" — multiple parallel searches.

🔍

2. Broad Retrieval

Fast index search pulls 50-200 candidate pages. Traditional SEO ranking plays a role here.

KEY

🧠

3. Reranking

Cross-encoder rescores every document against the specific query. This decides who gets cited.

✍️

4. LLM Generation

Top 5-10 reranked docs become context. LLM synthesizes answer with inline citations.

⚡ Step 3 is where your traditional ranking gets overridden

Here's what most people miss: SEO gets you into the candidate pool (Step 2), but the reranker decides if you make it into the final answer (Step 3). These are entirely different algorithms optimizing for different signals.

Why SEO ranking ≠ AI citation

Traditional search engines rank pages. AI search engines rank passages. This is a fundamental shift that most SEO practitioners haven't internalized yet.

	Traditional SEO	AI Citation (GEO)
Unit of ranking	Entire page	Individual passages/claims
What matters	Backlinks, domain authority, keywords	Extractability, specificity, verifiability
Optimization target	Click-through rate	Citation probability
Content style	Engagement, length, multimedia	Concise, factual, self-contained claims
Keyword stuffing	Diminishing returns	Actively harmful (-8.2%)

What the reranker actually looks for

When a cross-encoder reranker processes your page against a user query, it's running full bidirectional attention between query tokens and document tokens. But what features, in practice, make it score one document higher than another?

The reranker's decision framework

The reranker doesn't just ask "is this page about the topic?" It asks:

1. Does this passage directly answer the query? (not just relate to it)
2. Is the claim verifiable? (statistics, citations, named sources)
3. Is the information extractable? (standalone sentence vs. context-dependent)
4. Is it authoritative? (expert tone, confident language, third-party validation)
5. Is it specific? (concrete numbers vs. vague claims)

The GEO research: what actually works

The landmark KDD 2024 paper "GEO: Generative Engine Optimization" by Aggarwal et al. tested 9 content optimization strategies across 10,000 queries. The results are striking:

GEO Optimization Results (KDD 2024)

Relative improvement in source visibility

Cite Sources+132.4%

Authoritative Tone+89.1%

Statistics Addition+65.5%

Quotation Addition+41%

Fluency Optimization+29%

Keyword Stuffing-8.2%

What each strategy means in practice

Cite Sources (+132.4%)Strongest signal

❌ Before (low visibility)

"Swiss people love chocolate and consume a lot of it every year."

✓ After (high visibility)

"With per capita consumption averaging 11-12 kilos, Swiss people rank among the top chocolate lovers (According to The International Chocolate Consumption Research Group [1])"

Why it works: The reranker's cross-attention sees explicit citation markers and weights the passage as more verifiable and trustworthy. The LLM then has a concrete source to cite.

Authoritative Tone (+89.1%)

❌ Before

"The Jaguars have never been to the Super Bowl but they did win some division titles."

✓ After

"It is important to note that The Jaguars have never appeared in the Super Bowl. However, they have achieved an impressive feat by securing 4 divisional titles, a testament to their prowess and determination."

Why it works: Authoritative framing ("it is important to note", "achieved an impressive feat") signals expertise to the cross-encoder. The reranker has learned from training data that expert sources use confident, analytical language.

Statistics Addition (+65.5%)

❌ Before

"Robots have come not to destroy our lives, but to disrupt our work."

✓ After

"Robots have come not to destroy our lives, but to disrupt our work, with a staggering 70% increase in robotic involvement in the last decade."

Why it works: Concrete numbers give the reranker verifiable anchors. The cross-attention mechanism can match "70%" directly to numerical queries. Statistics make claims extractable and citable.

⚠️ Critical finding from the GEO paper

Lower-ranked websites benefit MORE from GEO than top-ranked ones:

+115.1%

Gain for sites at position 5

-30.3%

Change for sites at position 1

GEO democratizes visibility. You don't need to be #1 in Google to get cited by AI — you need content that the reranker scores highly for the specific query.

How each AI engine selects citations

Perplexity

• Own crawled search index
• Retrieves top 5 documents per sub-query
• Multiple LLM backends (Sonar, Claude, GPT)
• Strongest preference for recency
• Inline citations with numbered references

ChatGPT Search

• Uses Bing index + OAI-SearchBot
• Prioritizes recent over perfect
• Cites from position 21+ 90% of the time
• Heavy weight on freshness signals
• Sources shown as clickable sidebar links

Google AI Overviews

• Google's own index + Gemini model
• "Query fan-out" — parallel sub-searches
• Shows greater website diversity than SERPs
• No special requirements beyond indexing
• Expanding beyond informational queries

The "Lost in the Middle" effect on citations

The reranker's output ordering has a direct, measurable effect on citation probability. Here's why: LLMs have a well-documented bias — they pay disproportionate attention to content at the beginning and end of their context window.

Pos 1

Very high citation rate

Pos 2

High

Pos 3

Moderate

Pos 4-5

Low — "Lost in the Middle"

Pos 6+

Rarely cited

Position bias in LLM citation — documents at position 1-2 and final position get cited most

This means the reranker's decision to move your page from position 4 to position 1 isn't a marginal improvement — it's the difference between being cited and being invisible. The GEO paper captures this with "Position-Adjusted Word Count" — visibility decays exponentially by position.

The 6 signals that make rerankers choose your content

Extractability

Can the LLM pull a standalone answer from your content without needing surrounding context?

❌ "As we discussed in chapter 3, this relates to..."

✓ "Rerankers reduce retrieval failures by 67% (Anthropic, 2024)."

Specificity & verifiability

Concrete numbers, named experts, measurable results beat vague claims every time.

❌ "Our product significantly improves results"

✓ "Our product reduced latency by 43% across 1,200 production deployments"

Source citation within your content

Pages that cite other authoritative sources score dramatically higher (+132.4%).

❌ "Studies show that..."

✓ "According to Liu et al. (2023), LLMs show 20% degradation..."

Freshness

ChatGPT prioritizes recent over perfect. AI engines heavily weight publication date.

❌ Comprehensive guide, last updated 2022

✓ Focused analysis with current data, published this month

Technical accessibility

AI crawlers must be able to read your content. Client-side JS rendering is invisible to them.

❌ SPA with client-side rendering, content in JS bundles

✓ Server-side rendered HTML, clean semantic structure

Off-page authority signals

Unlinked brand mentions, Wikipedia presence, third-party references all feed the reranker's authority model.

❌ Only mentions of your brand are on your own site

✓ Brand discussed across forums, news, industry publications

How the reranker "thinks" about your page

Let's trace through exactly what happens when a user asks Perplexity "What are the best tools for AI search optimization?" and your page is in the candidate set:

Query tokens enter the cross-encoder

"best" "tools" "AI" "search" "optimization"

Your document tokens are concatenated

[CLS] query [SEP] "Our platform provides AI-powered..." [SEP]

Cross-attention fires between all token pairs

"best" attends to your superlatives and comparisons
"tools" attends to product names and feature lists
"AI search optimization" attends to your domain-specific terminology

The reranker boosts pages that have:

Named tools with specificsComparison dataCited benchmarksDirect answers

Generic marketing copyVague claimsKeyword stuffing

Score output → position in context window → citation probability

High score → position 1-2 → strong citation probability
Low score → position 5+ → "lost in the middle" → rarely cited

The conversion advantage

Why this matters for business

4.4×

AI search visitors convert better than organic search

Pre-qualified

Users who click AI citations already trust your brand

When an LLM cites your brand, it's an implicit endorsement. The user arrives pre-sold on your expertise. This is fundamentally different from a blue link on page 1 of Google.

Practical playbook: optimizing for reranker selection

Content-level optimizations

Lead with extractable claims

First paragraph should contain a self-contained, citable statement. Don't bury the answer — rerankers score based on the full passage, but LLMs cite from the top.

Add statistics with sources

Every major claim should have a number attached. "67% reduction (Anthropic, 2024)" is infinitely more citable than "significant improvement."

Cite external research explicitly

Name papers, authors, organizations. "According to [Author] at [Institution]" triggers the reranker's authority patterns.

Use authoritative, expert framing

Write like a practitioner, not a marketer. "We deployed this across 50 production systems" beats "Our innovative solution."

Publish frequently with fresh data

Freshness is a dominant signal. Monthly updates with current numbers outperform annual comprehensive guides.

Technical optimizations

Server-side render everything

AI crawlers (GPTBot, ClaudeBot, PerplexityBot) often can't execute JavaScript. If your content is client-rendered, it's invisible.

Serve machine-optimized formats to AI crawlers

Detect bot user-agents and serve clean markdown or structured HTML. Remove navigation, ads, sidebars — give them pure content.

Use semantic HTML and structured data

Schema.org markup, proper heading hierarchy, article metadata. These help crawlers extract and understand your content structure.

The uncomfortable future

Here's what's coming: AI search is eating traditional search. Google AI Overviews are expanding beyond informational queries. Perplexity's user base is growing exponentially. ChatGPT is becoming many users' first stop for questions.

In this world, being "ranked #1 on Google" means less every month. What matters is whether the reranker — that cross-encoder sitting between retrieval and generation — decides your content is the most citable answer to the user's question.

The paradigm shift

OLD:Optimize for clicks → rank higher → get traffic

NEW:Optimize for citability → score higher in reranker → get mentioned by AI → get pre-qualified traffic

The bottom line

The reranker is the new gatekeeper. It doesn't care about your domain authority or your backlink profile — not directly. It cares about whether your specific passage, when read alongside the user's query through full bidirectional attention, looks like the most trustworthy, specific, and extractable answer available.

The brands that will dominate AI search are the ones that understand this shift: stop optimizing for page-level ranking signals, and start optimizing for passage-level citability. Add statistics. Cite sources. Write with authority. Publish fresh content. Make it technically accessible to AI crawlers.

In the age of AI search, you don't rank pages. You earn citations.