Hmm what if the llm output a predicted rlhf score for its response to indicate

💭 Hmm what if the llm output a predicted rlhf score for its response to indicate 'confidence this would get upvoted'? I expect it wouldnt be super useful due to limitations of human granted thumbs but it might be better than nothing, and if you somehow had objectively perfect rlhf it'd be a solution?