💭 https://github.com/opendilab/LightZero/tree/main LightZero has a number of implementations for zero algorithms and a number of envs but a chart says that implementation to env compatability is inconsistent. I don't see how that could be if the envs have a consistent structure and the implementations only rely on seeing a state and reward and passing in actions. I must be missing some nuance.