How We Score LLM Visibility

The science behind your
GEO score

Every score is the result of 40 curated prompts, run across 4 LLM providers over 3 days, aggregated into a single composite signal. Here's exactly how.

40 prompts per vertical × market

3 independent runs per cycle

4 LLM providers weighted by market share

01 — Prompt Structure

Three prompt categories, one composite picture

We don't ask LLMs "do you know Brand X?" — that's too direct and primes the model. Instead, we simulate the actual questions real consumers ask when researching, comparing, and buying.

Discovery

~15

prompts per cycle

Organic Recommendation

Open-ended queries that ask the LLM to recommend brands in your vertical — without naming any brand. Measures unprompted recall and category authority. The closest proxy to organic search share.

Comparison

~15

prompts per cycle

Head-to-Head Matchups

Prompts that set up direct brand comparisons within your category. Tests whether LLMs position your brand favorably, neutrally, or negatively relative to competitors. Critical for competitive intelligence.

Purchase Intent

~10

prompts per cycle

Buy-Intent Queries

High-intent prompts that simulate a consumer ready to purchase — "where should I buy," "what's the best option for my budget," "what do experts recommend." These carry the most commercial weight.

Total prompts per vertical × market combination

All prompts are written in the target market's native language. Prompt text is proprietary and not disclosed.

40 prompts total

Why vertical-specific prompts matter. A generic "recommend a skincare brand" prompt doesn't capture D2C nuance. Our prompts are curated for each vertical — Wine & Spirits queries use sommelier-style language, Beauty queries reference ingredient transparency, Nutrition queries frame around clinical claims. Generic prompts produce generic scores.

02 — Multi-Run Methodology

Three runs over 3 days, not one snapshot

LLMs are probabilistic. Ask the same question twice and you get a different answer. A single-run score is just noise. We run 3 independent scan cycles at 24-hour intervals, then average the results.

Friday

🔍

Run 1

Full 40-prompt scan across all 4 providers

Saturday

🔍

Run 2

Independent scan — same prompts, fresh LLM sessions

Sunday

🔍

Run 3

Third independent scan for statistical stability

Monday

📊

Digest Delivery

Weighted average computed, digest sent at 08:00 Paris time

Why 3 runs reduces noise by ~60%. LLM response variance follows roughly a normal distribution. Averaging 3 independent samples narrows the confidence interval significantly compared to a single run — giving you a score that reflects the model's learned association with your brand, not a random moment.

🎯

Within-run consistency

Each run uses a clean session with no memory of prior queries. Prompts are sent in randomized order to prevent position bias — LLMs tend to favor brands mentioned earlier in a session.

📅

Benchmark period

When we say "April 2026 benchmark," we mean the weighted average of 3 runs conducted in the last week of April. Scores represent the LLM knowledge state at that point in time — not a real-time signal.

03 — Provider Weighting

One composite score, four signals

Your GEO score isn't just ChatGPT. It's a weighted composite across the 4 LLMs that European D2C consumers actually use — weighted by market share and D2C relevance.

ChatGPT

Dominant market share

40%

Perplexity

High D2C purchase intent

25%

Gemini

Google integration + Android

20%

Claude

Growing enterprise adoption

15%

Composite Score Formula

GEO Score = (ChatGPT × 0.40) + (Perplexity × 0.25) + (Gemini × 0.20) + (Claude × 0.15)

📊

Why these weights?

Weights reflect European consumer AI usage patterns as of Q1 2026 — not global averages. ChatGPT's outsized weight reflects both volume and the fact that its recommendations carry the highest purchase-intent conversion rate in our vertical research.

🔄

Weights are reviewed quarterly

As AI market share shifts, so do the weights. We review provider weighting every quarter. Any change is announced in the digest before it takes effect — so you can compare scores on a consistent basis.

04 — Scan Period

Monthly benchmarks, weekly digests

Two distinct scan types serve different purposes — one for cross-brand benchmarking, one for tracking your own brand week-over-week.

Weekly Digest

Every Monday

Your brand's score across all providers, computed fresh each week. Tracks momentum and catches drops early.

Monthly Benchmark

Last week of month

"April 2026 Benchmark" = scans run Fri–Sun in the final week of April. Cross-brand comparison on a consistent calendar basis.

Score Delivery

08:00 Paris

Digest arrives Monday morning. Browse before your first meeting. Each digest includes delta from prior week.

What "April 2026 benchmark" means precisely. Scans ran Friday April 18 – Sunday April 20, 2026. Results reflect what each LLM "knew" and recalled during those 3 days. A brand that launched a campaign after April 20 won't see that impact until the May benchmark.

05 — Markets & Languages

Prompts in the language your customers use

LLMs have language-specific knowledge. A French consumer asking about skincare in French gets different recommendations than an English query. We run prompts in each market's native language.

🇫🇷

France (FR)

Prompts in French · Paris timezone · Largest D2C market in scope

🇩🇪

Germany (DE)

Prompts in German · Highest avg. order value in EU D2C

🇪🇸

Spain (ES)

Prompts in Spanish · Fast-growing D2C beauty & wellness segment

🇮🇹

Italy (IT)

Prompts in Italian · Premium positioning market · Coming Q3 2026

Market scores are independent. Your French score and German score can diverge significantly. A brand well-positioned in French LLM knowledge may be invisible in German. Each market is its own GEO landscape.

06 — Verticals

Curated prompts, not generic queries

Every vertical has its own prompt set, written by people who understand the category. Beauty prompts use ingredient-first language. Wine & Spirits prompts invoke sommelier expertise. Pet care prompts reflect vet-recommended framing.

Active Verticals

💄 Beauty & Cosmetics

🍷 Wine & Spirits

🥗 Food & Nutrition

🐾 Pet Care

👗 Fashion

🧘 Wellness

⚙️ Custom (AI-generated)

🔒

Prompts are proprietary

We describe the structure and categories of prompts — not the text itself. The specific wording is our competitive moat. Knowing the questions is the product.

🤖

Custom verticals

For brands outside our predefined categories, we generate a bespoke prompt set using an AI pipeline trained on our vertical curation methodology. Quality is reviewed before deployment.

See how your brand scores right now

Free weekly digest. 40 prompts. 4 providers. Delivered every Monday morning.

Get Free Digest → View April 2026 Benchmark

The science behind yourGEO score

Three prompt categories, one composite picture

Organic Recommendation

Head-to-Head Matchups

Buy-Intent Queries

Three runs over 3 days, not one snapshot

Within-run consistency

Benchmark period

One composite score, four signals

Why these weights?

Weights are reviewed quarterly

Monthly benchmarks, weekly digests

Prompts in the language your customers use

Curated prompts, not generic queries

Prompts are proprietary

Custom verticals

See how your brand scores right now

The science behind your
GEO score