💠I forget the technique which finds an input which maximizes a certain classification in a model. Could the same thing be used to find a system prompt which maximizes the rlhf scores of data, thereby avoiding finetune for rlhf?
💠I forget the technique which finds an input which maximizes a certain classification in a model. Could the same thing be used to find a system prompt which maximizes the rlhf scores of data, thereby avoiding finetune for rlhf?