Anthropic RLHF Dataset: Human Preference Data (+ errors I found)

Hello friends!

I recently found this RLHF-style dataset while browsing Hugging Face Datasets. With Reinforcement Learning from Human Feedback (RLHF) becoming the primary way to train AI assistants, it’s great to see organizations like Anthropic making their RLHF dataset publicly available (released as: hh-rlhf).

Like other RLHF datasets, every example in this one includes an input prompt and two outputs generated by the LLM: a chosen output and a rejected output, where a human-rater preferred the former over the latter.

submitted by /u/cmauck10
[link] [comments]

Anthropic RLHF Dataset: Human Preference Data (+ Errors I Found)

Leave a Reply Cancel reply

Recent Posts

Recent Comments

18+ Content

Leave a Reply Cancel reply

Recent Posts

Recent Comments