Obsidian Source: LMGameArena - Can we use LLMs to play games
Summary
Pending synthesis from local Obsidian source.
Original source title: Lmgamearena Can We Use Llms To Play Games
Extracted Preview
Claude/Gemini are doing Pokemon. Some points I came across:
- Something called "refined agent harness" - what does it do I'm not sure.
!Pasted image 20250503142705.png
- Both screenshots + textual information and what not is provided.
- To solve the context problem, the description is cleaned and summarized from time to time.
- LLM performance on the simple task of playing and winning a game is highly dependent on the scaffold and tooling provided.
- Might need a secondary model call that occurs on context summary whose job it is to provide hints if the model seems stuck, and which is given a system prompt specifically for this purpose. It can be told to look for telltale signs of common fail-states and attempt to address then, and can even be given "meta" prompting about how to direct the other model.
- Some direction: pretty much no model seems capable of ingesting ASCII maps + screenshots and making a credible navigation map of an area, even when I substitute a collision map from the game's RAM for the screenshot.
- Models _need_ an overlay to function vaguely competently at navigating the overworld. Without it, considerable time is spent trying to walk through walls and invisible paths, etc.
- LLMs have no built-in memory, and essentially have to rely on text in their context window. This presents obvious problems for playing a stateful game. The standard trick of keeping a conversation history, summarizing it occasionally when it gets too full, and perhaps keeping some auxiliary memory on the side mitigates most of the issues I see in this regard.
Progress so far:
I one-shotted Claude to make a streamlit application with two panels - one for prefilled pattern, and one that LLM is going to predict. The basic outline for the project is made, just the actual experimentation and the cleaning the code a little bit is left. The input/output from grid to LLM etc needs to be done as well, for which I have some idea on how to do it.
The actual question is how exactly to design the experiments, and what metrics to consider.
Integration Notes
- Source folder:
/home/yashs/Documents/Docs/Obsidian/Research-Notes - Local source:
/home/yashs/Documents/Docs/Obsidian/Research-Notes/LMGameArena - Can we use LLMs to play games.md - Raw copy:
raw/obsidian/research-notes/LMGameArena - Can we use LLMs to play games.md