Step 1: Define the Prompt Set
The prompt set is the single most consequential decision. It defines the universe of queries that LLM SOV will be measured across, and the universe must be representative of how real buyers actually query AI assistants in your category.
Aim for 50 to 200 prompts covering four buckets: category queries ("what is the best X"), use-case queries ("X for [specific scenario]"), comparison queries ("X vs Y"), and decision queries ("should I use X for [need]"). Source the prompts from customer interviews, sales-team Slack channels, and existing search query data. Lock the prompt set before measurement starts; changing it mid-stream destroys the time series.
Step 2: Pick the Platforms
Include ChatGPT, Claude, Perplexity, and Gemini as the minimum core. Add platform-specific assistants based on audience: Copilot for enterprise, Grok for Twitter-adjacent, Qwen for Chinese market, Comet for emerging segments. Weight each platform by relevance to the brand's audience; equal-weighted SOV is rarely the right summary metric.
Step 3: Choose the Sampling Approach
AI responses are non-deterministic. Running each prompt once produces noise. The standard approach is to run each prompt three to five times per platform per measurement cycle and aggregate at the prompt level (was the brand mentioned in any run, in most runs, in all runs). Aggregated mention rates are much more stable than single-run mention.
Use depersonalized sessions. Logged-in or cookied sessions return responses biased by the user's history, which is fine for a single human but destroys cross-brand comparability.
Step 4: Parse the Responses
For each prompt-response, record: was the brand mentioned (yes/no), position in the list if applicable, accuracy of the description (1-5 scale), and which competitors were mentioned. The competitor side is critical, SOV is a relative metric.
Parsing is partially automated via regex and NLP. Brand name variants (Coca-Cola vs Coke vs Coca Cola) need to be enumerated to avoid undercounting. Edge cases (the brand mentioned as a negative example, the brand mentioned generically) require human review samples to validate the parser.
Step 5: Compute Share of Voice
For each prompt: count the brands mentioned. Each mentioned brand gets 1/n share for that prompt, where n is the number of brands. Aggregate across all prompts in the set, weighted by platform if running cross-platform SOV. The result is each brand's LLM SOV as a percentage of total category mentions across the prompt set.
Alternative weightings (position-weighted, where first-mentioned gets more credit; recommendation-weighted, where explicit recommendation outweighs mere mention) are useful for tactical analysis but make cross-period comparisons harder. Pick one method and stick with it.
Step 6: Establish the Cadence
Weekly is the standard. AI platforms refresh quickly, especially RAG-based ones like Perplexity that incorporate live web changes. Monthly cadence is too slow to catch competitive shifts. Daily is unnecessary noise and expensive. Weekly aligns with the standard MMM data cadence, which is the most important downstream consumer of the SOV series.
Step 7: Track Trends and Benchmarks
Absolute SOV varies by category and prompt set. The meaningful metrics are trends (how is your SOV changing) and competitive position (how does your SOV compare to the leader and the relevant peer set). Report trend as the rolling four-week SOV; report competitive position as your SOV percentile within the defined peer set.
Step 8: Feed the Output Into MMM and Attribution
The weekly SOV series is the proxy that lets MMM value the AI search channel. Export weekly SOV as a CSV with prompt-set governance metadata, and supply it to the MMM team as a media-equivalent variable. The full integration of SOV into the marketing measurement stack is what turns the metric from a reporting curiosity into a budget-allocation input.
How Presenc AI Helps
Presenc AI runs the above process at scale. The platform handles prompt-set governance, multi-platform sampling with depersonalization, response parsing with brand variant handling, weekly aggregation, and direct export into MMM tools. The methodology choices that determine whether SOV is signal or noise are made consistently across customers, which is what allows cross-brand benchmarking to work.