Article

Obsidian Source: LMGameArena - Can we use LLMs to play games

Summary

Pending synthesis from local Obsidian source.

Original source title: Lmgamearena Can We Use Llms To Play Games

Extracted Preview

Claude/Gemini are doing Pokemon. Some points I came across:

  • Something called "refined agent harness" - what does it do I'm not sure.

!Pasted image 20250503142705.png

  • Both screenshots + textual information and what not is provided.
  • To solve the context problem, the description is cleaned and summarized from time to time.
  • LLM performance on the simple task of playing and winning a game is highly dependent on the scaffold and tooling provided.
  • Might need a secondary model call that occurs on context summary whose job it is to provide hints if the model seems stuck, and which is given a system prompt specifically for this purpose. It can be told to look for telltale signs of common fail-states and attempt to address then, and can even be given "meta" prompting about how to direct the other model.
  • Some direction: pretty much no model seems capable of ingesting ASCII maps + screenshots and making a credible navigation map of an area, even when I substitute a collision map from the game's RAM for the screenshot.
  • Models _need_ an overlay to function vaguely competently at navigating the overworld. Without it, considerable time is spent trying to walk through walls and invisible paths, etc.
  • LLMs have no built-in memory, and essentially have to rely on text in their context window. This presents obvious problems for playing a stateful game. The standard trick of keeping a conversation history, summarizing it occasionally when it gets too full, and perhaps keeping some auxiliary memory on the side mitigates most of the issues I see in this regard.

Progress so far:

I one-shotted Claude to make a streamlit application with two panels - one for prefilled pattern, and one that LLM is going to predict. The basic outline for the project is made, just the actual experimentation and the cleaning the code a little bit is left. The input/output from grid to LLM etc needs to be done as well, for which I have some idea on how to do it.

The actual question is how exactly to design the experiments, and what metrics to consider.

Integration Notes

  • Source folder: /home/yashs/Documents/Docs/Obsidian/Research-Notes
  • Local source: /home/yashs/Documents/Docs/Obsidian/Research-Notes/LMGameArena - Can we use LLMs to play games.md
  • Raw copy: raw/obsidian/research-notes/LMGameArena - Can we use LLMs to play games.md

Links Created Or Updated

Open Questions