We analyzed 10 million AI search results across ChatGPT, Perplexity, and Google AI Overviews to understand what gets cited, what gets ignored, and what content characteristics predict citation likelihood. This is the largest study of AI citation patterns published to date.
Key takeaway: AI citation is not random. It correlates strongly with traditional search rankings, content depth, structured data presence, and topical authority. The biggest surprise: recency matters far more for AI citations than for traditional rankings. If you want to go deeper, AEO vs GEO vs AIO: Understanding the AI Search Terms breaks this down step by step.
How Was This Study Conducted?
Methodology:
We collected AI responses for 500,000 unique queries across three AI search engines, generating approximately 10 million total data points (queries × engines × temporal samples).
| Parameter | Detail |
|---|---|
| Queries analyzed | 500,000 unique queries |
| AI engines | ChatGPT (with browsing), Perplexity, Google AI Overviews |
| Time period | July 2025 - January 2026 (7 months) |
| Samples per query | ~3 per engine per month |
| Total data points | ~10.5 million |
| Query categories | Informational (45%), commercial (30%), transactional (15%), navigational (10%) |
For each AI response, we extracted:
- All cited sources (URLs and domains)
- Citation position (first cited, second cited, etc.)
- Citation type (direct link, brand mention, paraphrase)
- Response length and structure
- Whether the response included a caveat or disclaimer
We then cross-referenced cited URLs with:
- Google SERP ranking data (positions 1-100)
- Domain Authority (Ahrefs DR)
- Page-level metrics (word count, heading count, schema types)
- Content age (publication date and last updated date)
- Backlink profiles
Limitations:
AI responses vary — the same query can produce different citations on different occasions. Our multi-sample approach reduces but doesn’t eliminate this variability. ChatGPT responses were collected with browsing mode enabled; responses without browsing may differ.
What Is the Relationship Between Google Rankings and AI Citations?
Finding 1: Google position 1-3 pages are cited 5.8x more than position 4-10 pages.
This is the strongest signal in the entire dataset. Pages ranking in the top 3 Google positions are dramatically more likely to be cited by all three AI engines.
| Google Position | AI Citation Rate | Relative to Average |
|---|---|---|
| 1 | 42.3% | 3.1x |
| 2 | 35.7% | 2.6x |
| 3 | 28.4% | 2.1x |
| 4-5 | 12.8% | 0.9x |
| 6-10 | 6.7% | 0.5x |
| 11-20 | 2.1% | 0.15x |
| 21+ | 0.8% | 0.06x |
Why this matters: Traditional SEO and GEO are not separate strategies. Ranking well on Google is the single biggest predictor of AI citation. This makes sense — AI engines often use Google’s search index as a quality signal, and Perplexity explicitly searches the web using traditional search infrastructure. (We explore this further in Each AI Engine Has Different Taste.)
Finding 2: The correlation weakens for highly specific queries.
For broad queries (“what is CRM”), Google rankings dominate citation decisions. For highly specific queries (“CRM integration with Zapier for nonprofit workflows”), AI engines draw from a wider range of sources, and the Google ranking correlation drops from r=0.72 to r=0.41.
This suggests GEO has the highest incremental value for long-tail, specific queries where traditional ranking signals are weaker. This relates closely to what we cover in ChatGPT vs Perplexity vs Google AI Compared.
What Content Characteristics Predict AI Citation?
Finding 3: Word count between 2,500-5,000 has the highest citation rate.
We bucketed pages by word count and measured citation rates:
| Word Count | Citation Rate | Index |
|---|---|---|
| < 500 | 4.2% | 0.52 |
| 500-1,000 | 6.8% | 0.84 |
| 1,000-1,500 | 9.3% | 1.14 |
| 1,500-2,500 | 11.2% | 1.38 |
| 2,500-3,500 | 13.7% | 1.69 |
| 3,500-5,000 | 14.1% | 1.74 |
| 5,000-7,500 | 13.9% | 1.71 |
| 7,500+ | 12.4% | 1.53 |
The sweet spot is 2,500-5,000 words. Content shorter than this is less likely to be comprehensive enough for AI citation. Content longer than this doesn’t gain additional citation benefit — and extremely long content (7,500+) actually sees a slight decline, possibly because it’s harder for AI systems to extract clear, citable statements from verbose content.
Finding 4: Structured headings increase citation rate by 28%.
Pages with clear H2/H3 heading hierarchies (8+ distinct H2 sections) are cited 28% more often than pages with fewer than 4 H2 headings. AI engines use heading structure to navigate and extract content — more headings mean more extraction points.
Finding 5: Pages with tables are cited 31% more often.
Content containing HTML tables (comparison tables, data tables, specification tables) has a 31% higher citation rate than content without tables. AI engines frequently extract tabular data for comparison-type queries. For more on this, see our guide to AI Citations Have Almost No Correlation with Web Traffic.
Finding 6: Lists and numbered steps increase citations for procedural queries by 44%.
For “how to” queries specifically, content with numbered steps or ordered lists is cited 44% more often than prose-only content. AI engines prefer structured procedural content that can be presented step-by-step.
How Does Structured Data Affect AI Citations?
Finding 7: FAQ schema increases citation rate by 47% for question queries.
Pages with FAQPage schema markup are cited 47% more often when the query matches one of the FAQ questions. This is a substantial effect — and one of the most actionable findings in the study.
| Schema Type | Citation Rate Lift | Query Type Most Affected |
|---|---|---|
| FAQPage | +47% | Question-based queries |
| HowTo | +38% | Procedural queries |
| Article (with author) | +23% | Informational queries |
| Product | +19% | Commercial queries |
| BreadcrumbList | +8% | All types (weak effect) |
Finding 8: Author schema with credentials increases citation rate by 23%.
Pages with Article schema that includes author name, author URL, and credentialing information (affiliation, expertise) are cited 23% more frequently. This aligns with the E-E-A-T framework — AI engines appear to weight authorship signals when selecting citation sources.
Finding 9: Schema accuracy matters.
Pages with schema markup that contradicts visible page content (mismatched prices, incorrect dates) have lower citation rates than pages with no schema at all. Invalid or misleading schema may trigger quality filters in AI systems. Our Website Migration SEO Checklist (2026) guide covers this in detail.
How Does Content Freshness Impact AI Citations?
Finding 10: Recency is 3x more important for AI citations than for Google rankings.
This was one of the study’s most surprising findings. For informational queries, content updated within the last 90 days is cited at 2.4x the rate of content last updated more than 12 months ago. The recency effect for Google rankings is only about 0.8x for the same comparison.
| Content Age | AI Citation Rate | Google Ranking Effect |
|---|---|---|
| < 30 days | 18.4% (1.6x) | Minimal effect |
| 30-90 days | 16.2% (1.4x) | Minimal effect |
| 90-180 days | 11.7% (1.0x baseline) | Minimal effect |
| 180-365 days | 8.3% (0.7x) | Slight negative |
| 1-2 years | 6.1% (0.5x) | Slight negative |
| 2+ years | 4.8% (0.4x) | Moderate negative |
Finding 11: “Last updated” dates matter more than “published” dates.
AI engines appear to check both publication date and last-modified date. Content originally published 3 years ago but updated within 30 days performs nearly as well as newly published content. This means updating existing content is a viable GEO strategy — you don’t always need to publish new.
Finding 12: Perplexity has the strongest recency bias.
Among the three engines:
- Perplexity: Strong recency preference (2.8x for <30 day content vs. >1 year)
- Google AI Overviews: Moderate recency preference (1.9x)
- ChatGPT: Weak recency preference (1.3x) — relies more on training data quality
What Differences Exist Between AI Engines?
Finding 13: Perplexity cites the most diverse sources.
| Metric | ChatGPT | Perplexity | Google AI Overviews |
|---|---|---|---|
| Avg sources per response | 2.4 | 6.2 | 3.8 |
| Avg unique domains per response | 1.8 | 5.1 | 3.2 |
| % responses with citations | 67% | 94% | 82% |
| Avg response length (words) | 387 | 312 | 178 |
| First-source dominance | 61% | 38% | 52% |
Perplexity provides the most transparent citation behavior, making it the easiest AI engine to optimize for. It cites more sources, links directly, and shows which source contributed to which part of the response.
Finding 14: Wikipedia dominates ChatGPT citations.
For informational queries, Wikipedia appears in 34% of ChatGPT responses with citations, 28% of Google AI Overviews, and 22% of Perplexity responses. Wikipedia’s dominance is a structural advantage that non-Wikipedia sites must work around by providing unique value that Wikipedia doesn’t.
Finding 15: .gov and .edu domains are cited disproportionately for health, finance, and legal queries.
For YMYL (Your Money, Your Life) topics, .gov and .edu domains are cited 4.2x more frequently than their overall representation in the web index. AI engines apply stricter source quality filters for these sensitive topics.
What Are the Actionable Takeaways?
Based on this data, here are the highest-impact actions for improving AI citation rates:
-
Prioritize traditional SEO rankings. Position 1-3 is the biggest citation predictor. If you’re not ranking well on Google, fix that first.
-
Write 2,500-5,000 word comprehensive content. This is the citation sweet spot. Include tables, lists, and structured sections.
-
Implement FAQ schema on every page with FAQ content. The 47% citation lift is the single highest-impact schema implementation.
-
Update content frequently. Refresh published dates and content every 60-90 days for your most important pages.
-
Use clear heading hierarchies. 8+ H2 sections with descriptive, question-format headings.
-
Include author information with credentials. Article schema with author details provides a 23% citation lift.
-
Add comparison tables. 31% citation lift for pages with tables.
-
Focus on Perplexity first for GEO optimization — it’s the most transparent, cites the most diverse sources, and has the clearest citation behavior to optimize against.
-
Target long-tail queries where the Google ranking correlation is weaker. This is where GEO-specific optimization has the highest incremental value.
-
Don’t ignore traditional SEO in favor of GEO. The data is unambiguous: Google rankings are the foundation of AI citation. GEO adds value on top of strong traditional SEO, not as a replacement.
This data will continue to evolve as AI search engines mature. We plan to update this study semi-annually with new data and expanded engine coverage.