Publication Date
2025
Document Type
Book
Description
Large Language Models (LLMs) possess many emergent traits, including in-context learning capabilities that one can utilize for sequential decision-making tasks. This study examined how Llama 3 (8B parameters) performs in multi-armed bandit tasks using Bernoulli reward distributions. The LLM’s performance were compared to traditional reinforcement learning (RL) algorithms, including epsilon greedy and Upper Confidence Bound, to evaluate the model’s understanding of exploration and exploitation. Using a logistic regression classifier on PCA-reduced activation vectors extracted from the LLM’s decoder layers, over 90% accuracy was achieved in distinguishing prompts reflecting greedy versus anti-greedy decisions, highlighting the LLM’s internally consistent representations. However, efforts to steer the LLM’s behavior using steering vectors proved unsuccessful, highlighting the difficulties inherent in attempts to manipulate LLM behavior in complex decision-making tasks. These findings raise important questions about interpretability, control, and the emergent nature of in-context learning.
Files
Download Full Text (376 KB)
Recommended Citation
Cohen, Isaac; Malhotra, Hiten; and Hayes, William M., "Can LLMs Understand Multi-Armed Bandit Tasks?" (2025). Research Days Posters 2025. 28.
https://orb.binghamton.edu/research_days_posters_2025/28
