Dataset For Programming Mistakes From All Experience Levels

I am building a project and I want to fine-tune an LLM to incorporate it as a ChatBot.

The ChatBot will deliver feedback to students who submit programming solutions for exercises they are solving. I want to train the ChatBot on a specific way to give feedback like not giving the correct answer explicitly and not answering questions unrelated to the domain, and also being able to give hints when a student asks for it.

I couldn’t find a dataset close to what I need. Obviously I will need to clean any dataset that I find to match my needs perfectly.

If you know of any dataset that might help me with this, or any way that I can automate the generation of a mock dataset, because ChatGPT has limitions and I wasn’t able to make it generate the number of examples I need.

submitted by /u/iTsObserv
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *