💠thinking things like "what if legal actions were encoded as tokens so that their outputs could be used as a dynamic policy? That'd skip needing the simulator to handle policy in a separate network step which could make the system compatible with muzero for better performance when sims are slow" makes me feel smart