💠gradually losing my mind trying to figure out why the ai performs well in one eval but badly in another despite not seeing any differences to cause it
💠gradually losing my mind trying to figure out why the ai performs well in one eval but badly in another despite not seeing any differences to cause it