💭 todo for memory system Break down reporting of token costs so I can see where the budget is going. Allow llm to send multiple text adventure commands at once, useful when it wants to brute force or try various versions of a command. Write success metrics to run on logs Limit context size and implement ranking.

Still concerned that the ease of getting stuck in text adventures will hide improvement behind high random noise making it too expensive to get sufficient data, but will see if efficiency changes plus multiple command inputs are enough