💭 pre-thinking models seem more prone to justifying wrong answers than step-by-step models