You can't improve what you don't measure. This is the most basic principle in marketing analytics — and it applies with particular force to AI visibility, where the measurement problem is genuinely hard. Unlike traditional search, AI assistants don't publish impressions data, don't show you where your brand ranks, and don't tell you when they mention you. The only way to measure AI visibility is to go looking for it. This guide shows you exactly how.
Why AI visibility measurement is different from rank tracking
Traditional search rank tracking is relatively straightforward: you input a set of keywords, a rank tracker checks where you appear in Google results, and it reports your position. The metric is simple, consistent, and machine-readable. AI visibility measurement is fundamentally different for three reasons.
First, there is no "position" — AI responses are synthesised prose, not ranked lists. You're either in the response or you're not, and when you are, the context of that mention matters as much as the fact of it. Second, AI responses are non-deterministic — the same prompt asked twice will often produce different responses, sometimes mentioning your brand and sometimes not. This means you need multiple runs, not a single data point. Third, different AI models have very different perceptions of your brand based on their distinct training data and architectures. ChatGPT's representation of your brand may differ substantially from Gemini's or Claude's.
Manual testing: which prompts to use
The foundation of an AI visibility audit is a well-designed set of test prompts. These should span three categories:
- Category discovery prompts: "What are the best tools for [your category]?" / "Which companies do you recommend for [your use case]?" / "What are the leading providers of [your service type]?"
- Problem-solution prompts: "How do I [solve the problem your product addresses]?" / "What's the best way to [achieve the outcome you enable]?" / "I'm struggling with [pain point] — what should I use?"
- Brand-specific prompts: "What do you know about [your brand name]?" / "Tell me about [your brand] — what do they do?" / "Is [your brand] trustworthy / worth using?"
For each category, design at least 5-10 variants to account for the non-deterministic nature of AI responses. A brand that appears in 3 out of 10 category discovery prompts has very different visibility than one that appears in 9 out of 10.
Testing across ChatGPT, Perplexity, Gemini, Claude, and Grok
Each major AI assistant should be tested separately, because their outputs can differ dramatically for the same prompt. Here's what to note for each:
- ChatGPT (GPT-4o): Test both with and without browsing enabled. Note whether your brand appears, in what position relative to competitors, and with what framing.
- Perplexity: This model is highly retrieval-based — it pulls sources and cites them inline. Note not just whether you're mentioned but whether your site or your press coverage appears as a cited source.
- Google Gemini: Test both the conversational interface and, if you're searching in Google with AI Overviews enabled, the AI Overview that appears above organic results.
- Claude: Note that Claude is more conservative about making specific recommendations. A mention in Claude typically carries strong authority signal.
- Grok (xAI): Grok has access to X (Twitter) data in real time, making it particularly useful for testing brand perception around current events and recent product launches.
"Test from a neutral account with no chat history — LLMs personalise responses, and your own query history can bias results."
What to record and how to benchmark
For each prompt-model combination, record the following:
- Mention (yes/no): Did your brand appear in the response?
- Position: If multiple brands were listed, where did yours appear? First, second, third, or not at all?
- Sentiment: Is the framing positive, neutral, or negative?
- Accuracy: Does the AI describe your brand correctly? Wrong descriptions can be as harmful as no mentions.
- Competitor mentions: Which competitors are mentioned, and how frequently compared to you?
From this data, calculate your baseline mention rate (mentions / total prompts tested, expressed as a percentage) and your share of voice (your mentions / all brand mentions in the same responses). These become the benchmarks you track against over time as you execute your GEO strategy.
The limitations of manual testing
Manual testing gives you valuable qualitative insights and can surface major visibility gaps quickly. But it has significant limitations at scale. You can realistically test perhaps 50-100 prompt variants across 5 models — a total of 250-500 data points. That's enough for a directional baseline but not enough for statistical confidence, ongoing monitoring, or deep competitive analysis.
Prompt variance means any individual result could be an outlier. If you test a prompt once per model and your brand happens to appear — or not appear — that single data point may not reflect typical model behaviour. You need multiple runs per prompt, which multiplies the effort dramatically. For deeper context on what you're measuring, see our article on the 7 factors that determine AI visibility and the full GEO audit methodology.
Using Sight to automate AI visibility measurement
Sight was built specifically to solve the manual testing scalability problem. The platform runs hundreds of prompt variants across all major AI models — ChatGPT, Perplexity, Gemini, Claude, and Grok — and aggregates the results into a single visibility score and competitive dashboard.
Crucially, Sight runs tests repeatedly over time, so you can track your visibility score week-over-week as you execute your GEO strategy. You'll see which categories of prompts you're winning and which you're losing. You'll see competitor share of voice trending up or down. And you'll get specific content and technical recommendations based on where the gaps are largest.
Rather than spending hours manually testing and recording results in a spreadsheet, you get automated, statistically robust AI visibility data — the same kind of systematic measurement that rank trackers brought to traditional SEO. Start your free Sight audit →