💠should implement the following separately, maybe even in different branches:
- advanced efficient symmetry and orderless handling.
- stochasticity
- later, stochasticity plus hidden information
- highly dynamic policy space (my main development)
- add sparse graph info to tokens efficiently (a separate development)
Feel like I'm missing one. For each I think I need to get it working for alphazero OR muzero or both. I'm not convinced it's always worthwhile to implement everything in alphazero given it adds expensive overhead, and muzero magics away a lot of problems with its internal representation.