AI in Market Research: Predicting Purchase Intent with 90% Human Accuracy

AI in Market Research Predicting Purchase Intent with 90% Human Accuracy

If you work with customers, you already know the drill. You spend weeks fielding a survey, you pay a panel, then you wrestle with a stack of numbers that all look comfortingly average. By the time a decision lands on the roadmap, your competitor is already shipping. This is the moment AI stops being a side project and becomes an operator. AI in market research is no longer about cute slide decks. It is about simulating human judgment with enough fidelity that teams can test more ideas in less time, with less money, and with fewer blind spots. Done right, AI in market research turns the lights on before you take a step.

A new research method, Semantic Similarity Rating, shows what that looks like when you stop asking models for numbers and start asking for reasoning. The idea is simple. Let the model think in words, then turn that text into a score in a principled way. The results are not just faster. They are closer to what people actually say and do, and they come bundled with explanations you can act on.

1. The Billion-Dollar Problem: Why Traditional Market Research Is Slow And Broken

Traditional survey panels are costly, slow, and noisy. Companies spend billions every year on concept tests that often produce ratings distorted by satisficing, acquiescence, and positivity biases. You feel this when five-point Likert data clusters around the middle while the comments are vague. The paper behind this article opens with that reality. It describes how classic panels still guide product decisions, yet they frequently yield muddy signals despite heavy spend.

This is the first reason AI in market research has momentum. Speed is obvious, but the real win is signal quality. If a method can match the shape of human responses and retain the nuance of why people would or would not buy, you get leverage across the pipeline. You make fewer assumptions. You iterate faster. You decrease the need to field a full study for every small idea. That is the compounding effect teams are after when they start using AI in market research.

2. The AI Solution: An Introduction To Synthetic Market Research

Network of synthetic consumers linked to an LLM node, illustrating AI in market research for faster concept testing.
Network of synthetic consumers linked to an LLM node, illustrating AI in market research for faster concept testing.

Enter synthetic market research, a straightforward notion with serious impact. Instead of asking a human panel every time, you instruct a large language model to impersonate respondents with demographic attributes, show it the same product concept, and capture its response. In other words, you create synthetic consumers and put them through the same instrument.

There are multiple ways to elicit a response. You can force a direct Likert rating. You can collect a short text and ask a second pass to map that text to a number. Or, you can do something smarter. You can take that text and compute its proximity to a small set of anchor statements that represent the five Likert options, then translate the similarities into a probability distribution over 1 through 5. That last approach is Semantic Similarity Rating, or semantic similarity rating for short. It treats language like data, and it plays nicely with how modern models think.

This is not hand-wavy. The research validated the method on 57 real product surveys in personal care, with 9,300 unique participants. That is a meaningful benchmark for anyone serious about AI in market research.

3. The Likert Scale Flaw: Why Early AI In Market Research Failed

Early attempts asked models to pick a number. That seems tidy. It is also where things go sideways. When you constrain a model to output 1 through 5, it tends to regress to the middle. The distributions become too narrow, the shape diverges from human data, and you overfit to “3.” The paper quantifies this failure. Direct Likert ratings hit roughly 80 percent of human test–retest correlation, yet their distributional similarity to real panels is poor, with mean KS similarity around 0.26 for GPT-4o and 0.39 for Gemini-2.0-flash.

This is the core reason many teams lost faith early on. They tried AI in market research by forcing a number, saw the averages bunch up, and moved on. The issue was not the model. It was the elicitation.

4. The Breakthrough: How Semantic Similarity Rating Accurately Predicts AI Purchase Intent

Diagram of semantic similarity rating mapping text to Likert bars, visualizing AI in market research accuracy gains.
Diagram of semantic similarity rating mapping text to Likert bars, visualizing AI in market research accuracy gains.

Semantic similarity rating fixes the prompt, not the model. The workflow is elegant.

4.1 How SSR Works

  1. Prompt a synthetic consumer for a free-text purchase intent statement in response to a concept.
  2. Embed that text using a sentence embedding model.
  3. Compare it to five short reference statements, one per Likert category, each written to express a distinct level of intent.
  4. Convert the cosine similarities into a probability mass function over 1 through 5, and take the expected value if you need a single number.

The paper uses six sets of anchors to stabilize the mapping, then averages across sets. It retrieves embeddings with OpenAI’s “text-embedding-3-small,” and it tested two LLMs, GPT-4o and Gemini-2.0-flash, across temperatures. Results are reported at 0.5.

4.2 What Improves And By How Much

On the same dataset of 57 surveys and 9,300 humans, SSR reaches about 90 percent of human test–retest reliability while maintaining human-like distributions, with KS similarity above 0.85. For GPT-4o, you can see the headline number plotted at roughly 90 percent correlation attainment in the main comparison figure. And when you drill into metrics, SSR’s distributional similarity rises to means around 0.88 for GPT-4o and 0.80 for Gemini-2.0-flash, a sizable jump over direct ratings.

As a baseline, the authors also trained LightGBM on demographics and product features. Even with in-sample training, it reaches only about 65 percent correlation attainment, far below zero-shot SSR.

This is not a lab toy. It is a method you can put into production inside AI in market research without retraining a model. It uses text, the model’s native currency, and maps that to the same Likert metrics your stakeholders already understand.

4.3 Summary Metrics, Side By Side

AI in Market Research: Method Comparison—Reliability & KS Similarity
MethodModelCorrelation Attainment ρMean KS SimilarityNotes
Direct Likert RatingGPT-4o~80%~0.26Narrow, middle-heavy distributions.
Direct Likert RatingGemini-2.0-flash~80%~0.39Slightly better shape than GPT-4o, still off human.
Follow-Up Likert (Text Then Number)GPT-4o~83%~0.72Better reliability and shape than direct numbers.
Follow-Up LikertGemini-2.0-flash~83%~0.59Gains on reliability, distributions still compressed.
Semantic Similarity RatingGPT-4o~90%~0.88Best overall fidelity to human data.
Semantic Similarity RatingGemini-2.0-flash~90%~0.80Strong match to real distributions.

5. Beyond Numbers: Using AI For Deeper Consumer Insights

The real value of AI in market research is not only that SSR predicts AI purchase intent with human-like reliability. It also generates rich rationales that read like the comments you wish your panel had written. The study compares human open-text to synthetic open-text and finds the latter provides deeper reasons for and against purchase, with fewer platitudes and fewer “It’s good” throwaways. Examples include specific concerns about price, effectiveness, and side effects, or brand trust and use cases.

That matters for AI for consumer insights. You do not only get a number on a 1 to 5 scale. You get structured, editable language that explains the number. Analysts can cluster those rationales, link them to concept attributes, and spot failure modes early. Ask yourself how many times a project would have avoided a dead end if you had simply seen more honest reasons to say no. Now imagine that clarity on every iteration because AI in market research can run that analysis on day one.

6. The Practical Application: How Businesses Can Use AI Product Testing Today

Over-shoulder laptop view of iterative testing workflow, highlighting AI in market research for fast, reliable product decisions.
Over-shoulder laptop view of iterative testing workflow, highlighting AI in market research for fast, reliable product decisions.

You can adopt this method without changing your entire stack. Here is a pragmatic, engineer-friendly workflow that gets AI product testing running in a day.

6.1 A Simple SSR Pipeline You Can Ship

AI in Market Research: SSR Quick-Start Workflow for Product Testing
StepWhat You DoWhy It MattersImplementation Tips
1Create 5 reference anchors for each Likert optionDefines your intent gradientWrite short, domain-agnostic anchors. Build 6 alternative sets to average for stability.
2Prompt the LLM as a synthetic consumer with demographics and the concept stimulusElicits realistic free-text opinionsUse image or text concept slides as you would in a real study.
3Embed the response and each anchorConverts language into vectorsThe paper uses “text-embedding-3-small,” which worked well.
4Compute cosine similarities and map to a Likert pmfProduces a distribution, not a guessAverage over the 6 anchor sets to reduce variance.
5Report the expected value, plus the full pmf and rationalePreserves interpretabilityYour stakeholders keep their Likert metrics and gain explanations.
6Compare distributions to real benchmarks when availableValidates your pipelineKS similarity is a good first check of shape match.

With this in place, AI product testing shifts from calendar time to compute time. Concept tweaks that used to wait for a fielding window can be screened by synthetic panels first, then escalated to humans only when the signal warrants it. That is the kind of loop you want if you believe AI in market research should accelerate learning, not just cut costs.

6.2 Guardrails And Domain Fit

Every method has edges. SSR’s usefulness depends on two things. First, that the model has seen enough domain language to reason sensibly. Oral care worked in the study because the models were exposed to plenty of public chatter. Niche categories with thin public data will be harder. Second, persona conditioning helps, yet it is not a perfect proxy for every subpopulation, so treat subgroup slices with care.

The team also notes that embedding choice and similarity metric matter. Cosine worked well, but domain encoders might do better in specialized markets. That is a sensible upgrade path once your first pass is paying for itself.

7. What Changes For Teams: From Dashboard To Daily Practice

Let’s translate this into a week on a product team using AI in market research.

  • Monday morning: Marketing drops a new body-wash concept slide with a price point and a short claim. You run synthetic panels conditioned on age, income, and a handful of values-based personas. You get distributions and rationales within the hour.
  • Monday afternoon: You cut a variant that fixes the most common objections. You re-run SSR across the same personas. The mean intent moves from 2.9 to 3.3, and the distribution widens in a human-like way. That shape matters to the brand team because it mirrors real heterogeneity.
  • Tuesday: You run AI product testing on copy alternatives for the two best claims and pull AI for consumer insights to summarize why each subgroup prefers each line.
  • Wednesday: You screen three packaging options. The synthetic comments reveal a consistent “looks clinical” complaint among value-focused personas. You adjust the design before committing.
  • Thursday: You reserve a small human panel for the best concept, using the synthetic distributions as a prior.
  • Friday: You recap findings with confidence because the Likert numbers track human-like ranges instead of collapsing to the middle.

This is the operational promise of AI in market research. Faster concepts. Fewer guessy debates. More explanation alongside every score.

8. Why This Works: Language First, Numbers Second

The turning point is psychological as much as technical. People do not think in numbers. They think in reasons, then translate those reasons into a rating when forced. SSR mirrors that mental path. It elicits the reason, then computes the rating from it. That is why it outperforms not only direct numbers, but also a trained tabular model on demographics and features that cannot catch the nuance of language. The LightGBM comparison shows exactly that. It underperforms SSR on reliability despite being trained directly on the same domain.

The bonus is that those reasons are assets. They expose real objections and desires you can design against, and they travel beyond the research team. Product, design, and growth can read them without a stats translator. That is where AI in market research earns trust.

9. Common Questions From Stakeholders, Answered Straight

  • Does this replace human research? No. Think of SSR as a screening tool and an explainer. It reduces the number of full human studies you must commission and makes each one more focused. The paper is explicit on augmentation, not replacement.
  • Will this work outside personal care? It depends on domain language. If consumers talk about it online, the model probably has the background to simulate responses sensibly. If the web is silent, validate heavily.
  • Is persona conditioning reliable for subgroups? Use it, but do not over-index on tiny slices. Treat subgroup gaps as hypotheses to test with humans.
  • What about other metrics like relevance or trust? The authors demonstrate an example extension to concept relevance and report strong alignment, which hints at broader applicability.

This is the kind of clarity executives need before greenlighting AI in market research at scale.

10. Implementation Notes For Practitioners

You do not need a PhD to put this into production. You need a clean prompt, well-written anchor sets, and a simple vector math block. The paper even publishes a reference implementation and anchor design guidance.

A few practical guidelines for AI in market research teams:

  • Keep anchors short, generic, and domain-neutral. You are defining intensity, not copywriting a concept.
  • Average across multiple anchor sets. This reduces sensitivity to phrasing quirks.
  • Always store the full pmf and the raw rationale. The single score is for dashboards. The distribution and text are for decisions.
  • Track KS similarity against historical human studies when you have them. It is an easy sanity check that your synthetic panel is not drifting.

These are small habits with big payoffs when you scale AI in market research across teams.

11. The Bottom Line: Augmentation, Not Replacement

The cleanest way to summarize the work is this. Ask models for explanations first. Turn those explanations into numbers with a stable mapping. Compare the resulting distributions to real ones and keep the words for context. Do that, and AI in market research shifts from novelty to infrastructure.

The paper’s conclusion is measured. Synthetic panels can’t capture every contingency of a real purchase, like budgets, culture, or ad exposure. Yet the mix of interpretability, reliability near human test–retest, and rich qualitative context is enough to change early product decisions. Screen synthetically. Spend humans wisely. Move faster with more insight. That is why AI in market research is best viewed as augmentation, not replacement.

Call to action: Spin up a pilot this week. Pick a live concept, run SSR across three personas, then compare the distributions and rationales to your last human panel. If the fit looks good, fold it into your standard playbook. Treat this as your first step toward operationalizing AI in market research.

If the fit is off, use the gap to tune prompts, anchors, or personas. Keep iterations short and focused. Either way, you will learn faster than the team that waits, and you will harden your AI in market research engine with real feedback.

Primary source: “LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings.” Key results, dataset details, method, and limitations are drawn from the paper’s abstract, methods, figures, and appendix.

AI in market research
The use of language models and related techniques to simulate consumer feedback, predict purchase intent, and speed up research cycles.
Synthetic market research
Research that uses AI-generated respondents to test concepts and messaging before or alongside human panels.
Synthetic consumers
AI respondents prompted to mirror specific demographics or mindsets, used for rapid concept screening.
Semantic similarity rating (SSR)
A method that converts free-text opinions into Likert scores by comparing embeddings to anchor statements.
AI purchase intent
A model’s estimate of how likely a consumer is to buy, expressed on a familiar 1–5 or 0–100 scale.
Embeddings
Numeric vector representations of text that place similar meanings near each other in space.
Cosine similarity
A measure of how close two vectors are in direction, used to compare response text to anchors.
Likert scale
A standard 1–5 agreement or intent scale used in surveys.
Anchor statements
Short reference sentences that define each Likert level for SSR mapping.
Test–retest reliability
The consistency of results when the same instrument is used again under similar conditions.
Distributional similarity
How closely a set of model outputs matches the shape of human survey results.
Persona conditioning
Prompting a model with demographic or psychographic traits to shape responses.
Prompt design
The structure and wording that elicit useful, stable answers from a model.
Probability mass function
A discrete distribution over ratings, often produced by SSR before summarizing to a single score.
AI for consumer insights
The use of AI to extract reasons, themes, and drivers from open-ended feedback at speed.

1) How is AI used in market research?

AI in market research simulates realistic feedback fast. Teams create synthetic consumers, show them product concepts, capture free-text opinions, then map that language to a 1–5 score with semantic similarity rating. The result is human like purchase intent with rationales you can act on.

2) What is the best AI for market research?

For AI in market research, there isn’t a single best tool. The best approach is a method. Use semantic similarity rating with strong LLMs, such as GPT-class or Gemini-class models, to turn free-text opinions into reliable purchase intent and deeper AI for consumer insights. Method first, vendor second.

3) Will AI take over market research?

No. AI in market research augments humans. It automates early screening, compresses timelines, and surfaces patterns in language. Researchers still set hypotheses, design instruments, interpret edge cases, and make strategic calls. Think human judgment supported by fast, credible simulations.

4) What is a synthetic consumer or synthetic respondent?

In AI in market research, a synthetic consumer is an AI respondent conditioned to represent a specific profile, for example a 35-year-old parent in Texas with value-focused shopping habits. You show a concept, elicit natural language feedback, and evaluate purchase intent. It is a controlled, repeatable stand-in, not a replacement for people.

5) What is Semantic Similarity Rating (SSR)?

Within AI in market research, semantic similarity rating asks for opinions in words, not numbers, then uses embeddings to compare that text to five anchor statements that represent the Likert scale. The similarity scores map to a 1-5 rating, producing realistic distributions and clear explanations of why.