I Couldn’t Find Structured Data On UK Planning Refusals, So I Extracted It From PDFs Myself. Here Is The Schema Sample.

Most UK planning data is trapped in local council PDFs… so if you’re trying to build AI or risk models for property, its a nightmare to parse why things actually get rejected.

I spent the last few weeks building an extraction pipeline that pulls out the exact policy breaches, original context & officer notes into a CSV. I also wrote a script to abstract all the PII to just postcodes for GDPR compliance.

I put a 50 row sample of the schema up on Kaggle here: SAMPLE

If anyone here is working in proptech, data engineering or spatial modeling, I’d love your feedback on the schema before I pay to run the compute to scale this to to 10,000+ rows… what columns am I missing?

submitted by /u/a_cold_floor
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *