AI/ML Infrastructure

The Intuition Database

Adaptive decision memory for AI agents and applications

Rust ยท ~11MB ยท ~10K pred/s ยท 4 algorithms ยท WAL ยท RBAC
$ docker run -d -p 8080:8080 simeonlukov/banditdb:latest
$ pip install banditdb-python
Live decision stream running
banditdb ยท ~10K req/s

Every agent in your fleet makes the same mistake. Twice.

Four moving parts. Four failure modes. Weeks to build. Months to maintain.

โœ— The old way
๐Ÿ“จ Kafka โ€” event streaming
โšก Redis โ€” state management
๐Ÿ Python worker โ€” matrix math
๐Ÿ˜ Postgres โ€” interaction logs
โœ“ The BanditDB way
BanditDB BanditDB โ€” everything, in one binary
  • โœ“ In-memory matrix updates (microseconds)
  • โœ“ Built-in exploration algorithms, tunable per campaign
  • โœ“ Write-Ahead Log for crash recovery
  • โœ“ TTL cache for delayed rewards
  • โœ“ Propensity-logged Parquet export
  • โœ“ Entropy alerting โ€” detects silent exploration collapse
  • โœ“ Multi-tenancy, Python & TypeScript SDKs
  • โœ“ Native MCP tools for AI agents

How it works

Four API calls โ€” one to define, three to learn.

BanditDB keeps weight matrices in memory. Every outcome you report updates those weights in microseconds, gradually building intuition about which choice wins for which context.

1

Create

run once at startup

Name the campaign, list the arms, and set the context dimension. BanditDB initialises the weight matrices and is immediately ready to serve predictions.

curl -X POST http://localhost:8080/campaign \
  -H "Content-Type: application/json" \
  -d '{
    "campaign_id": "sleep",
    "arms": [
      "decrease_temperature",
      "decrease_light",
      "decrease_noise"
    ],
    "feature_dim": 5,
    "metadata": {
      "owner": "wellness-team",
      "features": ["sex","age_norm","weight_norm","activity","bedtime_norm"]
    }
  }'
โ†ป loops with every participant
2

Predict

Pass the participant's context vector. Get back the recommended intervention and an interaction ID to track it.

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{
    "campaign_id": "sleep",
    "context": [1.0, 0.35, 0.50, 0.60, 0.96]
  }'

# โ†’ {"arm_id": "decrease_temperature",
#    "interaction_id": "a1b2c3..."}
3

Act

Apply the chosen intervention. BanditDB holds the context in its TTL cache, ready to receive the reward when the outcome is known.

# arm = "decrease_temperature"
apply_intervention(user_id, arm)

# lower bedroom temperature to 17ยฐC
# BanditDB waits for the outcome...
4

Reward

Report the outcome the next morning. Matrices update in microseconds. Every subsequent participant gets a smarter recommendation.

curl -X POST http://localhost:8080/reward \
  -H "Content-Type: application/json" \
  -d '{
    "interaction_id": "a1b2c3...",
    "reward": 0.27
  }'

# โ†’ "OK"
For AI Agents

Your agent swarm builds shared intuition

Standard LLM agents are stateless โ€” if they make a bad decision, they repeat it tomorrow. BanditDB's MCP server gives your entire fleet shared, persistent decision memory.

banditdb-swarm
$ banditdb-swarm --agents 3 --campaign prompt_strategy

  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  get_intuition(context)    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚                     โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚                      โ”‚
  โ”‚   agent swarm       โ”‚                            โ”‚   BanditDB           โ”‚
  โ”‚   N agents ยท one    โ”‚โ—„โ”€โ”€ "chain_of_thought" โ”€โ”€โ”€โ”€โ”€โ”‚                      โ”‚
  โ”‚   shared model      โ”‚                            โ”‚   prompt_strategy    โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                            โ”‚   LinUCB ยท 4 arms    โ”‚
             โ”‚                                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚  execute ยท score ยท reward: 0.84  โœ“                โ–ฒ
             โ”‚                                                   โ”‚
             โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  record_outcome(iid, 0.84)  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  every reward updates the shared model for the entire swarm
register with claude
$ claude mcp add banditdb banditdb-mcp --env BANDITDB_URL=http://localhost:8080
create_campaign
Define a new decision
get_intuition
Ask which action to take
record_outcome
Report success or failure
campaign_diagnostics
Inspect learning state

Two commands and BanditDB is a native tool in Claude, Cursor, or any MCP-compatible host. Agents can get intuition, record outcomes, and inspect learning state โ€” no config file editing required.

Every decision made by any agent in the swarm improves the routing for all future agents. The network accumulates judgment that no single agent could build alone.

BanditDB is not a chat memory or a vector store. It is a decision layer that learns which choices work for which context โ€” and gets better with every outcome.

Causal Analysis

Understand why the model learned what it learned

BanditDB logs propensity scores at prediction time. Export to Parquet, run a Causal Forest, and go beyond correlation to genuine treatment effect estimates โ€” with confidence intervals and user-segment breakdowns.

scripts/causal_analysis.py --parquet exports/sleep.parquet
Loading exports/sleep.parquet ...
  1,847 interactions ยท 3 arms

Fitting causal forests (one per arm) ...

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
1. AVERAGE TREATMENT EFFECT  (90% CI)
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
  decrease_temperature  +0.1842  [+0.098, +0.270]  โœ“
  decrease_light        +0.0421  [-0.018, +0.103]  ~
  decrease_noise        -0.0297  [-0.104, +0.045]  ~

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
2. CAUSAL ARM ASSIGNMENT
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
  decrease_temperature  61.3%  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ
  decrease_light        24.7%  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ
  decrease_noise        14.0%  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
3. FEATURE IMPORTANCE
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
  decrease_temperature:
    activity      0.341  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ
    age_norm      0.287  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ
    weight_norm   0.198  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ
    sex           0.104  โ–ˆโ–ˆโ–ˆโ–ˆ
    bedtime_norm  0.070  โ–ˆโ–ˆโ–ˆ

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
4. WINNING SEGMENTS
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
  decrease_temperature:
    activity (high, ฮ”=+0.31), age_norm (high, ฮ”=+0.19)

  decrease_light:
    sex (female, ฮ”=+0.24), bedtime_norm (late, ฮ”=+0.18)

  decrease_noise:
    age_norm (young, ฮ”=+0.22), activity (low, ฮ”=+0.14)
ATE โœ“ โ€” causally significant

Temperature reduction has a statistically significant causal effect on sleep quality (+0.18, 90% CI entirely above zero). Light and noise show no reliable causal signal despite correlation in the raw data.

Arm assignment vs. live bandit

61% of users are causally best served by temperature. If the bandit's live distribution (Campaigns tab) matches this ratio, it has converged to the correct causal structure. A large mismatch means the model is still exploring.

Winning segments โ†’ A/B tests

High-activity adults respond best to temperature. Women with late bedtimes respond better to light reduction. Use these segments to design targeted trials or audit whether the bandit routes each profile correctly.

pip install econml scikit-learn polars

Requires LinUCB campaigns โ€” propensity scores are only logged for LinUCB. Thompson Sampling does not log propensities.

Walkthroughs

Learn by example

Six end-to-end examples, ordered from simplest to most advanced.

โ˜… Start here
๐ŸŒ™

Sleep Improvement

Temperature, light, or noise โ€” which adjustment works best for each person? A pure curl walkthrough, no SDK needed.

1
Measurable lift after ~300 rewarded outcomes โ€” assuming โ‰ฅ 70% next-morning reporting compliance.
curl 3 arms ยท 5 features Read walkthrough โ†’
๐Ÿ›’

E-Commerce Upsell

Discount, free shipping, or nothing โ€” learns which checkout offer closes each shopper without giving margin away.

2
Measurable lift after ~1,500 checkout interactions at 50% completion โ€” binary reward is noisiest of the four examples.
python 3 arms ยท 3 features Read walkthrough โ†’
โš–๏ธ

Law Firm Client Intake

Consult, intake form, refer, or decline โ€” learns which response maximises matter value for each enquiry profile, accounting for capacity and conflict risk.

3
Measurable lift after ~600 intake decisions at 50% outcome rate โ€” reward is multi-valued, not binary.
python 4 arms ยท 5 features Read walkthrough โ†’
๐Ÿ’ฐ

Dynamic Pricing

Hold margin or liquidate? Learns from sell-through rate, holiday proximity, and competitor pricing โ€” context describes the market, not the user.

4
Measurable lift after ~500 hourly cycles on common states โ€” rare holiday combinations require 2โ€“3 full seasons.
python 4 arms ยท 5 features Read walkthrough โ†’
๐Ÿค–

Prompt Optimisation

Learns which prompt strategy โ€” zero-shot, chain-of-thought, few-shot, structured โ€” produces the best response for each task type. Your evals run in production, not in a spreadsheet.

5
Measurable lift after ~400 requests โ€” LLM-as-judge gives near-100% reward observability.
python 4 arms ยท 5 features Read walkthrough โ†’
๐Ÿฅ

Adaptive Clinical Trials

Routes patients toward the most effective treatment arm in real time as evidence accumulates โ€” no waiting months for interim analysis.

6
Measurable lift after ~400 completed follow-ups โ€” enroll ~500 patients to account for 80% compliance.
python 3 arms ยท 4 features Read walkthrough โ†’
~11MB
Native binary for Linux, macOS, and Windows
ฮผs
Matrix updates via Sherman-Morrison rank-1 formula
~10K
Predictions per second on a single node
+16.7%
Lift over random on MovieLens 100K โ€” up to +24.6% with feature engineering
4
Algorithms: LinUCB ยท Thompson Sampling ยท NeuralLinUCB ยท Progressive Tournament

Start in one command

No sign-up. No cloud account. No configuration required.

Binary โ€” Linux, macOS, Windows

$ curl -fsSL https://raw.githubusercontent.com/dynamicpricing-ai/banditdb/main/scripts/install.sh | sh

Docker

$ docker run -d -p 8080:8080 simeonlukov/banditdb:latest

Join the community

Ask questions, share what you're building, get early updates. The BanditDB Discord is where the conversation happens.

Discord community

Get help, share projects, talk to the team. Free and open to everyone.

Join Discord
Talk to the founder

Questions about enterprise use, integrations, or just want to say hi โ€” book a 30-min call.