If you've been trying to figure out how to get mentioned by AI assistants, you've probably noticed a problem: most of the advice out there is speculation. "Make your content authoritative." "Build trust signals." These are vibes, not strategies.
But there is actual research. Four academic studies published between 2024 and 2025 tested specific GEO tactics with measurable results. We went through all of them so you don't have to. Here is what the data says.
The Study That Started It All (KDD 2024)
The foundational GEO paper by Aggarwal et al. was published at KDD 2024, the top data mining conference. The researchers created GEO-bench, a benchmark of 10,000 queries, and tested which content optimization methods actually improved source visibility in AI-generated responses.
The results were specific enough to act on:
- Adding citations from reputable sources improved visibility by 132.4%
- Adding statistics improved visibility by 65.5%
- Using authoritative expression (confident, expert tone) improved visibility by 89.1%
And here's what surprised the SEO community: keyword stuffing, the bread and butter of early SEO, actually decreased visibility in generative engine responses. The models penalize it.
This tells us something fundamental. AI models don't rank content the way search engines do. They evaluate trustworthiness, specificity, and citation quality. If your content reads like it was written to game an algorithm, AI models will skip it.
Not All AI Platforms Behave the Same
The AutoGEO study (Wu et al., 2025) tested something most practitioners hadn't considered: do the same optimization techniques work equally well across different AI platforms?
The answer is no. Only about 80% of GEO optimization rules overlap between Gemini, GPT, and Claude. The other 20% is platform-specific.
In practice, this means a piece of content you optimized for ChatGPT might perform differently on Gemini or Claude. If you're only testing against one platform, you're working with incomplete information.
This also explains why cross-platform monitoring matters. A brand that appears consistently in ChatGPT responses might be completely absent from Claude or Perplexity. Without visibility data across multiple platforms, you can't know where your blind spots are.
The Multi-Query Optimization Trap
IF-GEO (2025) introduced a finding that makes GEO more complicated than most people realize. When you optimize content for one specific query, it can reduce your visibility for related queries.
The researchers framed this as a "constrained optimization problem." Imagine you optimize your product page to perfectly answer "What is the best CRM for startups?" and it works. ChatGPT starts citing you. But your visibility for "best CRM for small businesses" or "affordable CRM tools" drops, because the optimization shifted the semantic framing of your content.
This is one reason manual, page-by-page GEO is so difficult. You can't optimize for individual queries in isolation without understanding how those changes interact with related queries.
The IF-GEO study reported a +11.07 overall improvement using their multi-query aware approach compared to naive single-query optimization. Not a massive number in isolation, but the difference between losing ground on related queries and gaining across the board.
The practical takeaway: think about query clusters, not individual questions. When you optimize content, test it against the full range of related prompts your prospects might use.
The 12-Pillar Threshold
GEO-16 (2025) established a 16-pillar auditing framework and found a clear threshold: pages that scored 0.70 or above across 12 or more of the 16 pillars achieved a 78% cross-engine citation rate. Pages below that threshold were cited at significantly lower rates.
This is useful because it gives you a concrete target. You don't need perfection across every dimension. You need to clear a quality bar across most of them. A page with strong citations, solid structured data, and authoritative tone but poor keyword density can still get cited. A page that excels at two pillars but is weak everywhere else probably won't get picked up by any AI platform.
The implication: breadth of quality matters more than depth in any single area. A GEO audit should check many dimensions, not just your favorite two or three.
Putting the Research into Practice
Here is what you can do with these findings right now:
Add citations to your existing content. This is the highest-impact tactic from the KDD 2024 study (132.4% improvement). Link to research papers, industry reports, and authoritative sources. If you're making a claim, back it up with a specific reference.
Include specific numbers. The 65.5% improvement from adding statistics is the second-highest single tactic. Replace vague claims like "significantly faster" with "47% faster in our Q3 benchmark." AI models treat specific data as a trust signal.
Drop the keyword stuffing. It actively hurts you in AI responses, even if it still works on Google. Write for the reader, not the crawler.
Test across platforms. Don't assume ChatGPT results represent what happens on Claude or Gemini. At minimum, test your high-priority queries across three platforms.
Think in query clusters. Before optimizing a page, map out the 5 to 10 related queries you also want to appear for. Optimize for the cluster, not a single question.
The Bottom Line
GEO is no longer guesswork. Four published studies give us specific, tested tactics with measurable results. Citations, statistics, and authoritative framing work. Keyword stuffing doesn't. Platform differences are real. Multi-query effects are real.
The brands that apply this research systematically will have a measurable advantage over those still relying on instinct.
Measure your content's GEO readiness with Presenc AI.
See how your content scores across AI visibility factors. Get specific recommendations based on the same research cited here. Test across ChatGPT, Claude, Gemini, and Perplexity.


