Open-source Instruction–response Code Dataset (22k+ Samples)

Hi everyone đź‘‹

I’m sharing an open-source dataset focused on code-related tasks, built by merging and standardizing multiple public datasets into a unified instruction–response format.

Current details:

– 22k+ samples

– JSONL format

– instruction / response schema

– Suitable for instruction tuning, SFT, and research

Dataset link:

https://huggingface.co/datasets/pedrodev2026/pedro-open-dataset

The dataset is released under BSD-3 for curation and formatting, with original licenses preserved and credited.

Feedback, suggestions, and contributions are welcome 🙂

submitted by /u/pedrodev2026
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *