Code Review Dataset: 200k+ Cases Of Human-Written Code Reviews From Top OSS Projects

I compiled 200k+ human-written code reviews from top OSS projects including React, Tensorflow, VSCode, and more.

This dataset helped me finetune a version of Qwen2.5-Coder-32B-Instruct specialized in code reviews.

The finetuned model showed significant improvements in generating better code fixes and review comments as it achieved 4x improved BLEU-4, ROUGE-L, SBERT scores compared to base model.

Feel free to integrate this dataset into your LLM training and see improvements in coding skills!

submitted by /u/Ok_Employee_6418
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *