Anthropic RLHF Dataset: Human Preference Data (+ Errors I Found)

Hello friends!

I recently found this RLHF-style dataset while browsing Hugging Face Datasets. With Reinforcement Learning from Human Feedback (RLHF) becoming the primary way to train AI assistants, it’s great to see organizations like Anthropic making their RLHF dataset publicly available (released as: hh-rlhf).

Like other RLHF datasets, every example in this one includes an input prompt and two outputs generated by the LLM: a chosen output and a rejected output, where a human-rater preferred the former over the latter.

submitted by /u/cmauck10
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *