How many prompts should be in the SOV measurement set?

50 to 200 for most categories. Very fragmented or competitive categories need 300+. Very narrow B2B niches can be measured with 30-50 prompts. The right number is dictated by the variance of mention rates across prompts; if a 50-prompt set produces stable weekly SOV, more prompts are noise. If it does not, more prompts are needed.

Should LLM SOV be computed per platform or cross-platform?

Both. Per-platform SOV is the tactical view (where am I winning and losing); cross-platform weighted SOV is the strategic summary metric (what is my overall AI visibility position). Most board reporting uses cross-platform weighted; most operational decisions use per-platform.

How does LLM SOV change week to week?

Stable categories with mature brand presence show SOV variance of 2 to 5 percentage points week to week. Fast-moving categories or emerging brands show wider variance. Sharp single-week jumps usually correspond to identifiable events: a Wikipedia article going live, a major PR placement, a competitor outage. Track these events alongside the SOV series to interpret movements.

Can I measure my own SOV without a tool?

Yes, manually, for small prompt sets. Run each prompt across each platform, record mentions in a spreadsheet, compute weekly. The work scales linearly with prompts times platforms times sampling depth, which is why most brands automate at the 50-prompt mark. Manual measurement is fine for proof of concept and for very narrow brand monitoring.

Share of Voice in LLMs: How to Measure It

Step 1: Define the Prompt Set

The prompt set is the single most consequential decision. It defines the universe of queries that LLM SOV will be measured across, and the universe must be representative of how real buyers actually query AI assistants in your category.

Aim for 50 to 200 prompts covering four buckets: category queries ("what is the best X"), use-case queries ("X for [specific scenario]"), comparison queries ("X vs Y"), and decision queries ("should I use X for [need]"). Source the prompts from customer interviews, sales-team Slack channels, and existing search query data. Lock the prompt set before measurement starts; changing it mid-stream destroys the time series.

Step 2: Pick the Platforms

Include ChatGPT, Claude, Perplexity, and Gemini as the minimum core. Add platform-specific assistants based on audience: Copilot for enterprise, Grok for Twitter-adjacent, Qwen for Chinese market, Comet for emerging segments. Weight each platform by relevance to the brand's audience; equal-weighted SOV is rarely the right summary metric.

Step 3: Choose the Sampling Approach

AI responses are non-deterministic. Running each prompt once produces noise. The standard approach is to run each prompt three to five times per platform per measurement cycle and aggregate at the prompt level (was the brand mentioned in any run, in most runs, in all runs). Aggregated mention rates are much more stable than single-run mention.

Use depersonalized sessions. Logged-in or cookied sessions return responses biased by the user's history, which is fine for a single human but destroys cross-brand comparability.

Step 4: Parse the Responses

For each prompt-response, record: was the brand mentioned (yes/no), position in the list if applicable, accuracy of the description (1-5 scale), and which competitors were mentioned. The competitor side is critical, SOV is a relative metric.

Parsing is partially automated via regex and NLP. Brand name variants (Coca-Cola vs Coke vs Coca Cola) need to be enumerated to avoid undercounting. Edge cases (the brand mentioned as a negative example, the brand mentioned generically) require human review samples to validate the parser.

Step 5: Compute Share of Voice

For each prompt: count the brands mentioned. Each mentioned brand gets 1/n share for that prompt, where n is the number of brands. Aggregate across all prompts in the set, weighted by platform if running cross-platform SOV. The result is each brand's LLM SOV as a percentage of total category mentions across the prompt set.

Alternative weightings (position-weighted, where first-mentioned gets more credit; recommendation-weighted, where explicit recommendation outweighs mere mention) are useful for tactical analysis but make cross-period comparisons harder. Pick one method and stick with it.

Step 6: Establish the Cadence

Weekly is the standard. AI platforms refresh quickly, especially RAG-based ones like Perplexity that incorporate live web changes. Monthly cadence is too slow to catch competitive shifts. Daily is unnecessary noise and expensive. Weekly aligns with the standard MMM data cadence, which is the most important downstream consumer of the SOV series.

Step 7: Track Trends and Benchmarks

Absolute SOV varies by category and prompt set. The meaningful metrics are trends (how is your SOV changing) and competitive position (how does your SOV compare to the leader and the relevant peer set). Report trend as the rolling four-week SOV; report competitive position as your SOV percentile within the defined peer set.

Step 8: Feed the Output Into MMM and Attribution

The weekly SOV series is the proxy that lets MMM value the AI search channel. Export weekly SOV as a CSV with prompt-set governance metadata, and supply it to the MMM team as a media-equivalent variable. The full integration of SOV into the marketing measurement stack is what turns the metric from a reporting curiosity into a budget-allocation input.

How Presenc AI Helps

Presenc AI runs the above process at scale. The platform handles prompt-set governance, multi-platform sampling with depersonalization, response parsing with brand variant handling, weekly aggregation, and direct export into MMM tools. The methodology choices that determine whether SOV is signal or noise are made consistently across customers, which is what allows cross-brand benchmarking to work.