as for the larger problem of making llms actually play well thats quite fun I

💭 as for the larger problem of making llms actually play well, that's quite fun. I don't have a cheap solution, but I think listing and maintaining counterfactual states and having each state separately explored and evaluated might be effective. Will need to be able to create meaningful counterfactuals, combine them, rule them out, and get a global view of how to act given the whole information state. Unfortunately while I think this may be affordable for one player, I can't imagine it being cost effective to have 10 such agents play a long game together.