even in reasonably sized action spaces It still might be useful to choose actions to simulate

💭 even in reasonably sized action spaces It still might be useful to choose actions to simulate based on how different the expected avg resulting would be from other actions thus exploring the broadest range of possibilities rather than seeing mostly the same result each time. Idk how to formalize that.