UK GDPR Small Business Q&A — 5,000 Synthetic Pairs With Article-level Citations [Synthetic]

Dataset for fine-tuning compliance assistants. Each pair includes:
– A practical SME-facing question (“Can I use pre-ticked consent boxes?”)
– An answer with specific UK GDPR article references, ICO guidance by name, and actionable steps
– Source metadata: which GDPR concepts were used, which generation strategy, timestamp

Generation method: questions via local Qwen 14B from a curated term bank, answers via DeepSeek API for factual reliability. JSON + Parquet, MIT license for the 1K sample.

This is a niche dataset — it’s not a benchmark contender, it’s for people building privacy tools for UK businesses. If you’re doing legal NLP or compliance RAG, might be useful.

Free sample: https://huggingface.co/datasets/Draeg82/uk-gdpr-small-business-qa

submitted by /u/a_serial_hobbyist_
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *