Multi-Armed Bandit Brain

Published September 19, 2025


This demo trains a deeper stochastic network (no backprop) on a $K$-armed bandit using the neural update rules described earlier:

We interact with a bandit environment with $K$ arms and Bernoulli rewards. The network outputs an action distribution via softmax over output nodes.

K-Armed Bandit:
Stats
Output distribution