normally you train a game ai to try to win quickly and lose slowly in

💭 normally you train a game ai to try to win quickly and lose slowly, in other words a loss now is worse than a loss later. Unfortunately that means if the ai finds an action which has no effect on the game except to make the action history longer, it'll learn to do that when losing