💠Need to figure out what are good metrics for evaluating intelligent progress through text adventures. Ideally useful even if the llm is stuck on a puzzle
💠Need to figure out what are good metrics for evaluating intelligent progress through text adventures. Ideally useful even if the llm is stuck on a puzzle