Better way to prepare datasets ?
I have my datasets in format :
text : length 19k
extracted entity 1 : list of entity 1 extracted
extracted entity 2 : list of entity 2 extracted
Does anyone have idea on how to finetune opensource model with this kind of data .
Is finetuning better option becuase the model(llm) have to learn to extract items from the text and length of text is so large ?
Example : I have train a llm model to look at whole book text and extract author name, place name, people name Now I have 100 of books data how can I proeare datsets to fine-tune llm to be very good at extracting also consider I have supervised data of book text with extracted author, people name place name from whole text……
How can I finetune a good model let me know
submitted by /u/Guilty-Tea6607
[link] [comments]