A downloadable game

A multi-armed bandit problem is a problem where there are multiple actions you can take and each action gives some reward, creating questions around whether to explore or exploit. We created a multi-armed bandit gym environment to help people understand how a transformer makes strategic decisions in such a scenario. While previous transformer interpretability research has focused on how models understand language, our solution is novel in the sense that it looks into the model’s strategic decision making process.

Download

Download
Interpretability Hackathon Submission.pdf 3 MB

Leave a comment

Log in with itch.io to leave a comment.