Skip to content

How the Bandit Works

JustAI uses a multi-armed bandit algorithm to decide which variant to serve each user. If you’ve ever wondered why the platform doesn’t just run a standard A/B test and declare a winner, this page explains why.

Imagine you’re in a casino with a row of slot machines. Each machine has a different (unknown) payout rate. Your goal: maximize total winnings.

A naive approach: pull each machine the same number of times, then commit to the best one. That’s a traditional A/B test.

A smarter approach: start by trying each machine a few times, then gradually shift your pulls toward the ones that seem to pay better — while still occasionally trying the others in case you got unlucky early on. That’s a multi-armed bandit.

JustAI does this with your content variants. Each variant is a “machine,” and the “payout” is your selected metric (click rate, conversion rate, etc.).

Traditional A/B TestMulti-Armed Bandit
Traffic splitFixed 50/50 (or equal) for the entire testDynamic — shifts toward winners over time
LearningHappens at the endContinuous
Wasted trafficHigh — losers get equal traffic until the test endsLow — losers get deprioritized quickly
When to useWhen you need a clean “winner” declarationWhen you want to maximize performance during the test

The key advantage: less traffic is wasted on underperforming variants. Instead of sending 50% of your audience a losing variant for weeks, the bandit reduces that allocation as soon as it detects underperformance.

The bandit constantly balances two goals:

  • Exploit — Send more traffic to the variant that’s currently winning.
  • Explore — Send some traffic to other variants to confirm the winner (or discover a new one).

This balance is controlled by an epsilon parameter. A higher epsilon means more exploration; a lower one means more exploitation.

When an experiment launches (or new variants are added), the bandit doesn’t have enough data to make informed decisions. During this cold start phase:

  1. Traffic is distributed roughly equally across all variants.
  2. As conversion data comes in, the bandit starts forming preferences.
  3. Allocation gradually shifts toward better-performing variants.

There’s no fixed “exploration phase” with a hard cutoff. The transition from equal distribution to optimized allocation is gradual and data-driven.

Traditional A/B tests require large sample sizes per variant because they need to measure each variant precisely. The bandit doesn’t need that — it only needs to be confident enough to shift traffic directionally.

This means:

  • You see performance improvements during the experiment, not just after.
  • You can test more variants simultaneously without proportionally increasing the required traffic.
  • Underperformers are naturally deprioritized, reducing their negative impact.

Even though the bandit dynamically allocates traffic among variants, the control/treatment split remains a true randomized A/B test. A fixed percentage of traffic always goes to the control (your original copy), and the rest goes to the treatment group (where the bandit operates).

This design gives you the best of both worlds:

  • Statistical rigor from the control/treatment comparison (you can always answer “is JustAI beating the original?”)
  • Optimization within the treatment group (the bandit maximizes performance among variants)