I want to investigate parameter-efficient fine-tuning (PEFT) methods (LoRA, bottleneck adapters, etc.) in the context of generative LLMs in different domains. I started reading the PEFT literature to find established benchmarks for my project. I saw people using datasets like SQuAD, E2E dataset, and XSum. Despite addressing multiple domains, there are no tags for the domain of each sample. I would need to have this information for my project. I could just use one dataset as one domain but the datasets I found do not usually have specific domains but contain samples from different domains. To summarize I would need datasets that
require a generative model (e.g. question answering with open answers, not multiple-choice)
cover a specific domain (sports, medicine, science, law, etc.) or contain this information as a feature for every sample
submitted by /u/beanswithoutjeans
[link] [comments]