💭 I realize a drawback to using self play data with non symmetrical models (stronger vs weaker model) In addition to dilution and stuff, it may learn to infer "if we made it to this state, we must be stupid, and therefor stupid actions are more likely" and learn to recommend them. Itd also learn that stupid positions are more likely to lead to losing states (more than is actually true) but thats not a big deal