putting together a reward free muzero with vae states and progressive widening sampling to cover

💭 putting together a reward free muzero with vae states and progressive widening sampling to cover hidden information and stochasticity, and action token prediction to handle arbitrary action spaces, and a heterogeneous transformer state.