How We Score LLM Visibility

The science behind your
GEO score

Every score is the result of 40 curated prompts, run across 4 LLM providers over 3 days, aggregated into a single composite signal. Here's exactly how.

40 prompts per vertical × market
3 independent runs per cycle
4 LLM providers weighted by market share
01 — Prompt Structure

Three prompt categories, one composite picture

We don't ask LLMs "do you know Brand X?" — that's too direct and primes the model. Instead, we simulate the actual questions real consumers ask when researching, comparing, and buying.

Discovery
~15
prompts per cycle

Organic Recommendation

Open-ended queries that ask the LLM to recommend brands in your vertical — without naming any brand. Measures unprompted recall and category authority. The closest proxy to organic search share.

Comparison
~15
prompts per cycle

Head-to-Head Matchups

Prompts that set up direct brand comparisons within your category. Tests whether LLMs position your brand favorably, neutrally, or negatively relative to competitors. Critical for competitive intelligence.

Purchase Intent
~10
prompts per cycle

Buy-Intent Queries

High-intent prompts that simulate a consumer ready to purchase — "where should I buy," "what's the best option for my budget," "what do experts recommend." These carry the most commercial weight.

Total prompts per vertical × market combination
All prompts are written in the target market's native language. Prompt text is proprietary and not disclosed.
40 prompts total
Why vertical-specific prompts matter. A generic "recommend a skincare brand" prompt doesn't capture D2C nuance. Our prompts are curated for each vertical — Wine & Spirits queries use sommelier-style language, Beauty queries reference ingredient transparency, Nutrition queries frame around clinical claims. Generic prompts produce generic scores.
02 — Multi-Run Methodology

Three runs over 3 days, not one snapshot

LLMs are probabilistic. Ask the same question twice and you get a different answer. A single-run score is just noise. We run 3 independent scan cycles at 24-hour intervals, then average the results.

Friday
🔍
Run 1
Full 40-prompt scan across all 4 providers
Saturday
🔍
Run 2
Independent scan — same prompts, fresh LLM sessions
Sunday
🔍
Run 3
Third independent scan for statistical stability
Monday
📊
Digest Delivery
Weighted average computed, digest sent at 08:00 Paris time
Why 3 runs reduces noise by ~60%. LLM response variance follows roughly a normal distribution. Averaging 3 independent samples narrows the confidence interval significantly compared to a single run — giving you a score that reflects the model's learned association with your brand, not a random moment.
🎯

Within-run consistency

Each run uses a clean session with no memory of prior queries. Prompts are sent in randomized order to prevent position bias — LLMs tend to favor brands mentioned earlier in a session.

📅

Benchmark period

When we say "April 2026 benchmark," we mean the weighted average of 3 runs conducted in the last week of April. Scores represent the LLM knowledge state at that point in time — not a real-time signal.

03 — Provider Weighting

One composite score, four signals

Your GEO score isn't just ChatGPT. It's a weighted composite across the 4 LLMs that European D2C consumers actually use — weighted by market share and D2C relevance.

ChatGPT
Dominant market share
40%
Perplexity
High D2C purchase intent
25%
Gemini
Google integration + Android
20%
Claude
Growing enterprise adoption
15%
Composite Score Formula
GEO Score = (ChatGPT × 0.40) + (Perplexity × 0.25) + (Gemini × 0.20) + (Claude × 0.15)
📊

Why these weights?

Weights reflect European consumer AI usage patterns as of Q1 2026 — not global averages. ChatGPT's outsized weight reflects both volume and the fact that its recommendations carry the highest purchase-intent conversion rate in our vertical research.

🔄

Weights are reviewed quarterly

As AI market share shifts, so do the weights. We review provider weighting every quarter. Any change is announced in the digest before it takes effect — so you can compare scores on a consistent basis.

04 — Scan Period

Monthly benchmarks, weekly digests

Two distinct scan types serve different purposes — one for cross-brand benchmarking, one for tracking your own brand week-over-week.

Weekly Digest
Every Monday
Your brand's score across all providers, computed fresh each week. Tracks momentum and catches drops early.
Monthly Benchmark
Last week of month
"April 2026 Benchmark" = scans run Fri–Sun in the final week of April. Cross-brand comparison on a consistent calendar basis.
Score Delivery
08:00 Paris
Digest arrives Monday morning. Browse before your first meeting. Each digest includes delta from prior week.
What "April 2026 benchmark" means precisely. Scans ran Friday April 18 – Sunday April 20, 2026. Results reflect what each LLM "knew" and recalled during those 3 days. A brand that launched a campaign after April 20 won't see that impact until the May benchmark.
05 — Markets & Languages

Prompts in the language your customers use

LLMs have language-specific knowledge. A French consumer asking about skincare in French gets different recommendations than an English query. We run prompts in each market's native language.

🇫🇷
France (FR)
Prompts in French · Paris timezone · Largest D2C market in scope
🇩🇪
Germany (DE)
Prompts in German · Highest avg. order value in EU D2C
🇪🇸
Spain (ES)
Prompts in Spanish · Fast-growing D2C beauty & wellness segment
🇮🇹
Italy (IT)
Prompts in Italian · Premium positioning market · Coming Q3 2026
Market scores are independent. Your French score and German score can diverge significantly. A brand well-positioned in French LLM knowledge may be invisible in German. Each market is its own GEO landscape.
06 — Verticals

Curated prompts, not generic queries

Every vertical has its own prompt set, written by people who understand the category. Beauty prompts use ingredient-first language. Wine & Spirits prompts invoke sommelier expertise. Pet care prompts reflect vet-recommended framing.

Active Verticals
💄 Beauty & Cosmetics
🍷 Wine & Spirits
🥗 Food & Nutrition
🐾 Pet Care
👗 Fashion
🧘 Wellness
⚙️ Custom (AI-generated)
🔒

Prompts are proprietary

We describe the structure and categories of prompts — not the text itself. The specific wording is our competitive moat. Knowing the questions is the product.

🤖

Custom verticals

For brands outside our predefined categories, we generate a bespoke prompt set using an AI pipeline trained on our vertical curation methodology. Quality is reviewed before deployment.

See how your brand scores right now

Free weekly digest. 40 prompts. 4 providers. Delivered every Monday morning.