💠For ml I wonder if there's a practice of tracking how much the weights differ from where they started in addition to trackling loss. I think it's different to see ongoing exploration + slow convergence vs virtually no loss change + slow convergence? It doesn't have to be, since the weights could be undoing each other, but the second case sounds less likely to lead to a future loss improvement.