Hi everyone đź‘‹
I’m sharing an open-source dataset focused on code-related tasks, built by merging and standardizing multiple public datasets into a unified instruction–response format.
Current details:
– 22k+ samples
– JSONL format
– instruction / response schema
– Suitable for instruction tuning, SFT, and research
Dataset link:
https://huggingface.co/datasets/pedrodev2026/pedro-open-dataset
The dataset is released under BSD-3 for curation and formatting, with original licenses preserved and credited.
Feedback, suggestions, and contributions are welcome 🙂
submitted by /u/pedrodev2026
[link] [comments]