Research Days Posters 2025

Can LLMs Understand Multi-Armed Bandit Tasks?

Isaac Cohen, Binghamton University--SUNY
Hiten Malhotra, Binghamton University--SUNY
William M. Hayes, Binghamton University--SUNY

Publication Date

2025

Document Type

Book

Description

Large Language Models (LLMs) possess many emergent traits, including in-context learning capabilities that one can utilize for sequential decision-making tasks. This study examined how Llama 3 (8B parameters) performs in multi-armed bandit tasks using Bernoulli reward distributions. The LLM’s performance were compared to traditional reinforcement learning (RL) algorithms, including epsilon greedy and Upper Confidence Bound, to evaluate the model’s understanding of exploration and exploitation. Using a logistic regression classifier on PCA-reduced activation vectors extracted from the LLM’s decoder layers, over 90% accuracy was achieved in distinguishing prompts reflecting greedy versus anti-greedy decisions, highlighting the LLM’s internally consistent representations. However, efforts to steer the LLM’s behavior using steering vectors proved unsuccessful, highlighting the difficulties inherent in attempts to manipulate LLM behavior in complex decision-making tasks. These findings raise important questions about interpretability, control, and the emergent nature of in-context learning.

Download

Download Full Text (379 KB)

Recommended Citation

Cohen, Isaac; Malhotra, Hiten; and Hayes, William M., "Can LLMs Understand Multi-Armed Bandit Tasks?" (2025). Research Days Posters 2025. 28.
https://orb.binghamton.edu/research_days_posters_2025/28

COinS

Research Days Posters 2025

Can LLMs Understand Multi-Armed Bandit Tasks?

Publication Date

Document Type

Description

Recommended Citation

Search

Browse

Author Corner

Links

Research Days Posters 2025

Can LLMs Understand Multi-Armed Bandit Tasks?

Authors

Publication Date

Document Type

Description

Files

Recommended Citation

Share

Search

Browse

Author Corner

Links