Most UK planning data is trapped in local council PDFs… so if you’re trying to build AI or risk models for property, its a nightmare to parse why things actually get rejected.
I spent the last few weeks building an extraction pipeline that pulls out the exact policy breaches, original context & officer notes into a CSV. I also wrote a script to abstract all the PII to just postcodes for GDPR compliance.
I put a 50 row sample of the schema up on Kaggle here: SAMPLE
If anyone here is working in proptech, data engineering or spatial modeling, I’d love your feedback on the schema before I pay to run the compute to scale this to to 10,000+ rows… what columns am I missing?
submitted by /u/a_cold_floor
[link] [comments]