Obsidian Source: LMGameArena - Can we use LLMs to play games

Summary

Pending synthesis from local Obsidian source.

Original source title: Lmgamearena Can We Use Llms To Play Games

Extracted Preview

Claude/Gemini are doing Pokemon. Some points I came across:

Something called "refined agent harness" - what does it do I'm not sure.

!Pasted image 20250503142705.png

Both screenshots + textual information and what not is provided.
To solve the context problem, the description is cleaned and summarized from time to time.
LLM performance on the simple task of playing and winning a game is highly dependent on the scaffold and tooling provided.
Might need a secondary model call that occurs on context summary whose job it is to provide hints if the model seems stuck, and which is given a system prompt specifically for this purpose. It can be told to look for telltale signs of common fail-states and attempt to address then, and can even be given "meta" prompting about how to direct the other model.
Some direction: pretty much no model seems capable of ingesting ASCII maps + screenshots and making a credible navigation map of an area, even when I substitute a collision map from the game's RAM for the screenshot.
Models _need_ an overlay to function vaguely competently at navigating the overworld. Without it, considerable time is spent trying to walk through walls and invisible paths, etc.
LLMs have no built-in memory, and essentially have to rely on text in their context window. This presents obvious problems for playing a stateful game. The standard trick of keeping a conversation history, summarizing it occasionally when it gets too full, and perhaps keeping some auxiliary memory on the side mitigates most of the issues I see in this regard.

Progress so far:

I one-shotted Claude to make a streamlit application with two panels - one for prefilled pattern, and one that LLM is going to predict. The basic outline for the project is made, just the actual experimentation and the cleaning the code a little bit is left. The input/output from grid to LLM etc needs to be done as well, for which I have some idea on how to do it.

The actual question is how exactly to design the experiments, and what metrics to consider.

Integration Notes

Source folder: /home/yashs/Documents/Docs/Obsidian/Research-Notes
Local source: /home/yashs/Documents/Docs/Obsidian/Research-Notes/LMGameArena - Can we use LLMs to play games.md
Raw copy: raw/obsidian/research-notes/LMGameArena - Can we use LLMs to play games.md

Obsidian Source: LMGameArena - Can we use LLMs to play games

Summary

Extracted Preview

Progress so far:

Integration Notes

Links Created Or Updated

Open Questions