How Brand Recommendations Change on ChatGPT When a Model Is Replaced
When you ask ChatGPT a question, you assume you are talking to one model. In practice, ChatGPT routes your query to one of several underlying models behind the scenes. This is not new information. OpenAI has never guaranteed a single model per session, and anyone inspecting the API response metadata can see the model identifier change from one request to the next.
What has not been quantified until now is how much this routing actually matters for brand recommendations. If the model behind a ChatGPT response changes from one request to the next, does it make a meaningful difference in which brands get recommended? Or do the models broadly agree?
We ran 1,000 responses per keyword across a 10-prompt set spanning local and non-local AI search queries to find out. The answer is that it matters a great deal:
- 2.4x more brands per response on one model versus the other, mechanically diluting every brand's share of voice
- Coverage swings of up to 89 percentage points for individual brands. Some appear in 92% of one model's responses and only 3% of the other's
- Near-zero correlation in cited sources (Spearman ρ = 0.037 for local search). The two models draw from almost entirely different pools of websites
- Newer models lean harder on niche and branded sources. gpt-5-3 draws 80% of its citations from branded blogs and specialty guides, versus 62% for gpt-5-mini, which relies more on mainstream press and Wikipedia. The brands that win on gpt-5-3 are the ones featured in these niche sources
- Invisible to the end user. The routing happens silently, so your brand's AI visibility on any given day depends partly on a coin flip you cannot see
The implication goes beyond day-to-day variance. OpenAI updates its models frequently. ChatGPT has gone through multiple model replacements in the past year alone, and each time, the underlying model that handles the majority of queries changes. If the shift between two models that coexist today is this large, every model update is effectively a reset of your brand's AI visibility. To maintain an accurate picture, you need to run visibility audits at least as often as these models change.
This post quantifies the shift across brands, share of voice, and cited sources, and lays out what it means for anyone tracking brand visibility in AI search.
ChatGPT Routes to Multiple Models
We submitted identical prompts to ChatGPT repeatedly and examined the raw response metadata. Each response includes a model identifier that shows which underlying model handled the query. Across our dataset, ChatGPT routed queries to two models:
- gpt-5-mini: handled 65-75% of queries
- gpt-5-3: handled 25-35% of queries
The split varied by query type. Local search queries were routed to gpt-5-mini 75% of the time. Non-local commercial queries had a more balanced 61/39 split.
| Local AI Search | Non-Local AI Search | |
|---|---|---|
| gpt-5-mini | 75% of queries | 61% of queries |
| gpt-5-3 | 25% of queries | 39% of queries |
This alone is interesting. But the real finding is what happens to brand recommendations.
The Models Produce Fundamentally Different Outputs
The two models do not just phrase things differently. They recommend different brands, in different quantities, with different levels of conviction.
Brands per response
gpt-5-mini is verbose. It lists significantly more brands per response than gpt-5-3:
| gpt-5-3 | gpt-5-mini | Ratio | |
|---|---|---|---|
| Local AI Search | 7.0 brands/response | 17.1 brands/response | 2.4x |
| Non-Local AI Search | 5.2 brands/response | 9.8 brands/response | 1.9x |
This has a direct mechanical effect on Share of Voice (the percentage of total brand mentions that belong to a specific brand). When a model mentions 17 brands instead of 7, each brand's share of voice is diluted. A brand with 100% coverage would have roughly 14% share of voice on gpt-5-3 but only 6% on gpt-5-mini, purely due to list length.
How Correlated Are the Two Models?
Before looking at individual brands, we measured whether the two models at least agree on the overall ranking. We computed the Spearman correlation between each model's brand coverage:
| Spearman ρ | Interpretation | |
|---|---|---|
| Local AI Search | 0.43 | Moderate agreement |
| Non-Local AI Search | 0.31 | Weak agreement |
A Spearman ρ of 0.31 means the two models disagree on the rank order of brands almost as often as they agree. The top 3 brands tend to be stable across both models, but beyond that, which model handles your query determines whether your brand appears at all.
The correlation is even lower for cited sources. In local search, the Spearman correlation for domain citations is 0.037, which is effectively zero. The two models draw from completely different source pools.
This overall picture sets the stage for the specific shifts we see at the individual brand and domain level.
Brand Coverage Shifts
For each brand in our dataset, we measured how its coverage (the percentage of responses where it appeared) changed between the two models. The distribution tells the story:
The histogram is heavily left-skewed. Most brands are mentioned more by gpt-5-mini (negative shift values) because it lists more options. But the tail on the right reveals brands that gpt-5-3 strongly prefers.
What these shifts mean in practice
Some of these shifts are extreme. In our non-local dataset, one brand appeared in 92% of gpt-5-mini responses but only 3% of gpt-5-3 responses. That is a brand that users saw recommended 9 out of 10 times on one model, and then essentially never on the other.
The reverse also happened. Several brands had +20 percentage point shifts, meaning gpt-5-3 consistently recommended them while gpt-5-mini barely mentioned them. These are brands that were invisible on the model that handles the majority of queries, but suddenly became top recommendations on the alternative model.
This is not a small accuracy issue. It is a complete inversion of visibility for some brands. If ChatGPT shifts its routing ratio, if gpt-5-3 handles 40% of queries instead of 25%, brands in these tails would see their recommendation rates change dramatically overnight, with no underlying change in brand quality or content strategy.
Share of Voice Shifts
Share of voice shifts are smaller in absolute terms (because share of voice is distributed across many brands) but reveal a different dynamic:
Most brands cluster near zero, but a handful show dramatic shifts. In the local search dataset, three brands had +10 percentage point share of voice on gpt-5-3. They captured 10 percentage points more of the recommendation pie when the premium model was answering. On the non-local side, the top three brands each gained roughly 9 percentage points of share of voice on gpt-5-3 because it mentioned fewer competitors.
The combination of coverage and share of voice shifts creates a compounding effect. A brand that is both mentioned more often and gets a larger share of mentions on one model has its visibility amplified. A brand on the losing side of both shifts effectively disappears.
The Models Cite Different Sources
The divergence extends beyond brands to the websites ChatGPT cites in its responses.
Citation volume
gpt-5-mini cites dramatically more sources per response:
| gpt-5-3 | gpt-5-mini | Ratio | |
|---|---|---|---|
| Local AI Search | 7.1 domains/response | 20.7 domains/response | 2.9x |
| Non-Local AI Search | 17.3 domains/response | 22.0 domains/response | 1.3x |
Citation source distribution
The domain coverage shift distribution is nearly symmetric for local search. The models cite completely different sets of sources with roughly equal probability in each direction.
Non-local search shows more spread, with some domains having +50-70 percentage point shifts. These are sources that one model relies on heavily while the other ignores entirely.
Source categories tell a story
We categorized every cited domain into three types: press (mainstream media, review sites), Wikipedia, and branded (company blogs, niche guides, specialty sites).
Local AI Search:
| Category | gpt-5-3 cites/response | gpt-5-mini cites/response | Shift |
|---|---|---|---|
| Press | 1.5 | 6.9 | -5.4 |
| Wikipedia | 0.0 | 0.9 | -0.9 |
| Branded/niche | 5.6 | 12.8 | -7.2 |
gpt-5-mini cites 4.5x more press sources in local search and is the only model that cites Wikipedia at all. gpt-5-3 relies more heavily on branded and niche sources as a proportion of its total citations.
Non-Local AI Search:
| Category | gpt-5-3 cites/response | gpt-5-mini cites/response | Shift |
|---|---|---|---|
| Press | 1.0 | 1.6 | -0.6 |
| Wikipedia | 0.0 | 0.1 | -0.1 |
| Branded/niche | 16.3 | 20.3 | -4.0 |
Non-local commercial search is dominated by branded and niche sources on both models (over 90% of citations). The category split barely changes between gpt-5-3 and gpt-5-mini. But this masks the real divergence: the two models draw from completely different branded sources. gpt-5-3 favors one set of review blogs and comparison sites (pipermind.com, saascrmreview.com, saasly.online) while gpt-5-mini favors an entirely different set (blog.salesflare.com, crm.org, start.streak.com). The brand shift in non-local search is not driven by a category shift but by which specific niche sites each model trusts.
This means your content strategy needs to account for both models. In local search, the split is between press and niche. In non-local search, it is between different niche sources. Either way, being present in only one model's preferred source pool leaves you invisible on the other.
Why Brand Shifts Follow Source Shifts
The brand coverage differences are not random. When we looked at which sources co-occur with brands that each model favors, a clear mechanism emerged: the models recommend different brands because they read different websites, and those websites feature different brands.
Local search: niche guides vs mainstream press
In our local search dataset, the brands most favored by gpt-5-3 appear alongside niche specialty blogs and guides (sites like specialty coffee magazines, local neighborhood blogs, and curated city guides). These sources mention a specific set of "insider" picks that mainstream outlets do not cover. When gpt-5-3 reads these sources, it recommends the brands it finds there.
The brands most favored by gpt-5-mini, on the other hand, co-occur with mainstream press: large media outlets, national newspapers, and Wikipedia. These sources feature a different, more mainstream set of brands. Several brands that gpt-5-mini recommends in over 60% of responses do not appear in a single gpt-5-3 response, because the niche sources that gpt-5-3 relies on simply do not mention them.
When we look at the source category split, brands favored by gpt-5-3 appear alongside responses that are roughly 80% branded/niche sources and 20% press, with zero Wikipedia citations. Brands favored by gpt-5-mini appear in responses with a much higher proportion of press (34%) and Wikipedia references.
Non-local search: different branded sources, same mechanism
In non-local commercial search, both models draw primarily from branded and niche sources (over 90% in both cases). But they draw from completely different branded sources. Each model has its own set of review blogs, comparison sites, and industry guides that it favors.
The most extreme example in our dataset is a brand whose own blog was cited in nearly 100% of gpt-5-mini responses. That same brand had 92% coverage on gpt-5-mini but only 3% on gpt-5-3. The model that reads the brand's blog recommends the brand. The model that does not read it does not recommend it. The correlation is almost mechanical.
The takeaway
The models do not have different "opinions" about brands. They have different reading lists. If your brand appears in the sources that one model favors, you will be visible on that model. If it does not, no amount of brand quality will compensate. This is why understanding which sources each model draws from is just as important as tracking your brand coverage directly.
Social and community sources remain equally relevant
One thing that does not shift between models is the role of social and community platforms. Reddit and LinkedIn appear at similar rates in both gpt-5-3 and gpt-5-mini responses, with Reddit cited in roughly 13-15% of non-local search responses across both models. While many source categories diverge dramatically between models, social and community channels remain a consistent signal. This makes them a stable piece of your optimization strategy. Regardless of which model handles the query, having your brand discussed on Reddit threads, LinkedIn posts, and community forums continues to contribute to your AI visibility.
What This Means for Your AI Visibility Strategy
Single-query monitoring is unreliable
If you check ChatGPT once to see whether it recommends your brand, you are getting a sample of one from a probabilistic mixture of models. You need multiple queries (replicates) to average out the model routing effect and get a statistically meaningful picture.
Your share of voice benchmark has a hidden confounder
If your brand tracking shows share of voice dropping from 15% to 10%, the cause might not be a change in ChatGPT's opinion of your brand. It might be that the model routing ratio shifted. More queries going to gpt-5-mini (which dilutes share of voice by mentioning more brands) and fewer to gpt-5-3.
Content strategy needs to cover both source types
gpt-5-3 favors branded and niche content. gpt-5-mini favors mainstream press and breadth. If your visibility strategy only targets one source type, you are optimizing for one model while being invisible on the other.
The model your customer gets is random
You cannot control which model handles a user's query. This makes replicates essential. You need enough samples to capture the weighted average across both models, not a snapshot from one.
What Helps You Rank on ChatGPT's Latest Model
Based on our data, gpt-5-3 (the newer model in our dataset) has distinct source preferences that differ from its predecessor. If you are optimizing for where ChatGPT is heading rather than where it has been, these patterns matter:
gpt-5-3 relies more heavily on niche and branded sources. In local search, 80% of the domains cited alongside gpt-5-3's favored brands are branded blogs, specialty guides, and curated lists. For non-local commercial search, it is over 90%. The brands that appear most consistently on gpt-5-3 are the ones featured in these niche sources that mainstream press does not cover.
Mainstream press coverage still matters, but its relative weight is lower. gpt-5-mini draws 34% of its local search citations from press outlets, compared to 20% for gpt-5-3. Press coverage gives you broad baseline visibility on the model that handles the majority of queries today, but it is less decisive on the newer model.
Wikipedia presence helps on gpt-5-mini but is absent from gpt-5-3. In our local search data, gpt-5-3 cited Wikipedia zero times across all replicates. gpt-5-mini cited it in roughly one out of every response. This suggests newer models may be de-emphasizing Wikipedia as a source.
Your own content can be a direct ranking factor. In our non-local dataset, a brand whose blog was cited in nearly 100% of gpt-5-mini responses had 92% coverage on that model. Getting your own content indexed and cited by ChatGPT's web search is one of the most direct paths to visibility. The key is being present in the specific review sites, comparison articles, and industry guides that each model draws from.
Social channels remain a constant. Reddit and LinkedIn citation rates are stable across both models, making community presence a reliable investment regardless of model updates.
How to Measure This for Your Brand
Tracking how ChatGPT's internal model routing affects your brand does not require building infrastructure from scratch. The Sellm API automates the entire process. Submit your prompts, configure your replicate count, and get back structured data including which model handled each response, at less than $0.01 per prompt.
Run 50 prompts with 10 replicates each for under $5 and you will have statistically robust visibility data across both models, with the model identifier included in every response so you can segment and compare.