Open-source instruction–response code dataset (22k+ samples)

Hi everyone 👋

I’m sharing an open-source dataset focused on code-related tasks, built by merging and standardizing multiple public datasets into a unified instruction–response format.

Current details:

– 22k+ samples

– JSONL format

– instruction / response schema

– Suitable for instruction tuning, SFT, and research

Dataset link:

https://huggingface.co/datasets/pedrodev2026/pedro-open-dataset

The dataset is released under BSD-3 for curation and formatting, with original licenses preserved and credited.

Feedback, suggestions, and contributions are welcome 🙂

submitted by /u/pedrodev2026
[link] [comments]

Open-source Instruction–response Code Dataset (22k+ Samples)

Leave a Reply Cancel reply

Recent Posts

Recent Comments

18+ Content

Leave a Reply Cancel reply

Recent Posts

Recent Comments