Why this matters
This is an AI-native dashboard vs. BI evaluation framework — eight questions for separating products that were designed AI-native from products that bolted AI onto an existing BI tool. In the last 18 months, every CPG analytics vendor that wasn't already on the AI story has shipped an AI feature. Trade-promotion- management tools, syndicated-data portals, BI dashboards built five years ago — they all have a natural-language chat box, a summary sidebar, or a suggested-question prompt. In a 30-minute demo, they all look roughly the same.
They are not the same. The difference between AI-native (the analytical product was designed assuming AI from the start) and AI bolted onto BI (an existing BI or TPM product had an LLM added as a feature) shows up after the demo, in the daily work — and by then the contract is signed.
This page is the evaluation framework. Eight questions to put to any vendor pitching AI-for-CPG analytics, with what a good answer sounds like, what a bad answer sounds like, and why the question separates the two architectures. The questions are designed to be asked in a real working session with a real data set, not in a discovery call.
AI-native dashboard vs. BI evaluation: the two architectures
AI-bolted-on-BI: the underlying product is a dashboarding or data-portal product that assumes the analyst picks a report, picks filters, and reads a chart. The AI layer is a chat or summary feature added on top of that flow. The data model, the report catalog, and the analyst workflow were designed before AI was possible. The AI feature works with the existing flow, not instead of it.
AI-native: the underlying product was designed assuming an AI layer would own the analysis-selection step from day one. The data model is shaped for the AI to reason over it; the analyst's primary interaction is reviewing the system's reasoning, not driving the filter sidebar. There is no separate "AI feature" — the AI is the product. For the longer definition of what owning the analysis- selection step means, see What is agentic AI for CPG analysts?.
In a demo, both architectures can answer "what's our share at Sprouts." In production, they diverge on the questions where the methodology matters — which is most of them.
The eight questions at a glance
| # | Question | What separates the two architectures |
|---|---|---|
| 1 | Show me the system picking which analyses to run | Owns analysis selection vs. filter-autocomplete |
| 2 | What happens when two analyses disagree? | Surfaces reconciliation vs. lets analyst find it |
| 3 | Can I cite an answer in a buyer deck? | Permalinked methodology-pinned URL vs. screenshot |
| 4 | Can the system reproduce a result month-over-month? | Pins methodology versions vs. silent drift |
| 5 | How does it handle SPINS attribute hierarchy? | CPG-native model vs. flat-column generic |
| 6 | What does it do for out-of-scope questions? | Says so explicitly vs. silently hallucinates |
| 7 | Can the analyst correct the system's reasoning? | Updates the analysis vs. adds an inert note |
| 8 | Where does the vendor live in the four-layer stack? | Honest about layers 2–3 vs. claims end-to-end |
The eight questions in depth
1. "Show me the system picking which analyses to run on a question I bring."
The question you bring should be a decision question the vendor hasn't seen — "are we losing share in adaptogenic refrigerated at Sprouts and should I move it to a Whole Foods deck?"
Good answer: the system runs three to five analyses without being told which ones (velocity by SKU, share-of-segment, ACV trend, competitor SKU launches, Whole Foods Circana cross-check). The vendor talks through what the system picked and why.
Bad answer: the vendor types a series of filter queries and narrates each one as "you can ask it to do X." That's a chat interface on top of a fixed report catalog — Level 1 or 2 on the spectrum in What is agentic AI for CPG analysts?. Useful, but not what the AI-native pitch is selling.
2. "What does the system do when two of its analyses disagree?"
The case that separates the two architectures most cleanly: the ACV trend looks like distribution is up; the velocity trend looks like the move is driven by store reclassification, not real distribution gains.
Good answer: the system surfaces the disagreement up-front — "the +3.2pt ACV move at Sprouts is likely a store-cluster reclassification effective March 14; the comparable real- distribution change is +0.4pts." The vendor shows where in the UI this surfaces and how the analyst can override.
Bad answer: the system runs the analyses but presents them as two separate dashboards. "You can see the ACV chart here, and if you click into the store-cluster view here, you'll see the reclassification." That puts the reconciliation back on the analyst — which is the work the AI-native pitch is supposed to remove.
3. "Can I cite the system's answer in a buyer-facing deck?"
A buyer at Sprouts pushes back: "that's not what I see on our side." The analyst needs to defend the number.
Good answer: every analytical claim the system makes has a permalinked URL that loads the same view, with the same filters, the same source data version, and the same methodology choices. The analyst can paste that URL into a buyer email or a deck footer.
Bad answer: "You can screenshot the chart and the dashboard remembers the state." That's not a citation; it's a screenshot. For why this matters in practice, see Why "ask your data" is the wrong frame for AI in CPG analytics.
4. "Can the system reproduce a result month over month when the underlying data refreshes?"
SPINS refreshes its attribute hierarchy quarterly. Store-cluster definitions shift. Retailer reclassifications happen mid-period. A result that was true on May 1 may not be reproducible on June 1 if the system doesn't hold the methodology-version pinned.
Good answer: the system pins the methodology version to the result. "This share-of-segment number was computed on attribute hierarchy v2.3; the current production version is v2.4, and the same query against v2.4 returns this slightly-different number. Both are queryable; the difference is auditable."
Bad answer: "The data refreshes every week." That answers a different question (latency) and dodges the reproducibility question. AI-bolted-on-BI tools often can't pin methodology versions because the underlying data model wasn't designed to preserve them.
5. "How does the system handle SPINS attribute hierarchy depth?"
SPINS attributes are several levels deep — top-level category, subcategory, segment, attribute cluster (organic / non-GMO / adaptogenic / etc.). A question like "show me share in adaptogenic refrigerated" requires the system to know which level of the hierarchy "adaptogenic" lives at.
Good answer: the system uses the SPINS attribute hierarchy natively. The filter sidebar reflects the actual hierarchy levels. The chat input understands attribute terms without the analyst having to map them to category codes.
Bad answer: the system treats SPINS data as a generic transaction table with category as a single flat column. "You can filter on the 'adaptogenic' tag here." That works for a demo but breaks down when an analyst asks a cross-attribute question — "share among organic adaptogenic refrigerated" — that the flat-column model can't represent. This is a tell that the underlying product wasn't built CPG-native.
6. "What does the system do when I ask something outside its data scope?"
Ask the system: "how is our DTC business performing this month?" The system doesn't have DTC data (it has SPINS, which is brick-and- mortar scanner data).
Good answer: the system says so explicitly. "DTC data isn't in the data you've loaded. I can show you total brick-and-mortar SPINS-tracked revenue for the period, which is $2.4M. If you have DTC data you'd like to add, here's how to load it."
Bad answer: the system answers anyway, hallucinating a DTC number or quietly returning a SPINS number labeled "total business." This is the most dangerous failure mode in CPG AI tools — silent out-of-scope answers that the analyst doesn't know to question. AI- native systems generally know their scope; AI-bolted-on-BI systems inherit the LLM's tendency to answer anyway.
7. "Can the analyst correct the system's reasoning in-place?"
In the worked example from What is agentic AI for CPG analysts?, the analyst disagrees with the system's framing — "the Andronicos dip isn't a promo overlap, it's a Q1 reset issue."
Good answer: the analyst types or selects the correction; the system updates the downstream analysis to reflect it. The correction is captured (so the next time a similar situation arises, the system can use the analyst's prior framing).
Bad answer: "You can add a note." A note doesn't update the analysis; it lives next to it. AI-bolted-on-BI tools usually can add notes but can't propagate corrections, because the underlying report catalog is fixed.
8. "Where in the workflow does Scout/the vendor live — and where does it not?"
The honest answer to this question separates vendors who understand the four-layer CPG analyst stack (source, modeling, analysis, distribution) from vendors who think they're a four-in-one tool.
Good answer: "We own the modeling and analysis layers — layers 2 and 3 in that framing. We don't replace the syndicators (SPINS/Circana/Stratum), and we don't replace Google Slides. We make the parts between those two faster."
Bad answer: "We're an end-to-end solution." No CPG analytics tool is end-to-end — the source layer is the syndicators, and the distribution layer is whatever the buyer reads (decks, emails, broker spreadsheets). A vendor that claims end-to-end either misunderstands the workflow or is overselling.
How to actually run the evaluation
The eight questions above don't work in a discovery call. They work in a 60-minute working session where the vendor demos against either (a) a sanitized sample of the brand's own SPINS extract, or (b) the vendor's reference data set, with the brand bringing two specific decision questions and one specific cross-source reconciliation question.
Structure for the session:
- 15 min — the brand explains its category review process. What data sources, what analyses, what's slow about Tuesday. The vendor listens.
- 30 min — the brand walks the vendor through three questions against the data. One decision question. One methodology-edge- case question (the store-cluster reclassification, the SPINS attribute refresh). One out-of-scope question (the DTC question). The vendor answers in the product.
- 15 min — the brand asks 4–6 of the eight questions above that surfaced during the demo. The vendor answers candidly. Questions the vendor dodges are the questions to follow up on after the session.
Two things make this work:
- Bring questions the vendor hasn't seen. Any vendor demo can answer the vendor's own canned questions perfectly. Real evaluation happens on the brand's own data, with the brand's own edge cases. Most vendors that look strong in a generic demo weaken on the brand's data; the few that hold up are the ones worth a second meeting.
- Insist on the working session before the contract. Vendors who can't or won't run a working session against the brand's data — even a sanitized sample — are telling you something important about the product's depth.
Red flags
A few signals — independent of the eight questions — that the product is AI-bolted-on rather than AI-native:
- "Chat with your dashboard" is the headline feature on the marketing site. AI-native products tend to lead with the output (a defended monthly category read) rather than the input (a chat box).
- The product roadmap is "we're adding AI to..." rather than "the product is AI-first." Roadmap framing reveals the architectural inheritance.
- Sales decks compare against legacy BI feature-by-feature. The comparison frame itself betrays the architecture — AI-native products are usually framed against the analyst's day, not against Tableau's feature list.
- No live demo on your data. A vendor that will demo only their reference data, never the prospect's, almost always has fragility on real edge cases the reference data doesn't expose.
None of these are dealbreakers alone. Together, they're a pattern.
Doing this in Scout
Scout was built AI-native from day one — the analysis-selection step is the system's job, methodology versions are pinned to results, and the modeling layer (unification, reconciliation, persistence) handles SPINS + Stratum + Circana extracts together rather than forcing the analyst to stitch them. The eight questions in this framework are exactly the questions Scout's customer demos are structured to answer, on the customer's own data. If you're running this framework against multiple vendors, Scout will hold up on questions 1–7; question 8 is the honest answer — Scout owns layers 2 and 3, not 1 or 4.
Summary + further reading
- The marketing-page difference between AI-native and AI-bolted-on is small; the difference in daily analyst work is the load- bearing one, and it surfaces only on methodology edge cases.
- The eight questions in this framework are designed to expose that difference in a 60-minute working session, on the brand's own data.
- Red flags — chat-with-your-dashboard headlines, "we're adding AI to" roadmaps, no live demo on the prospect's data — are individually weak signals but collectively reveal the architecture.
Related: What is agentic AI for CPG analysts? · The AI-native CPG analyst stack