Cleaned JSON Version Of The USDA Phytochemical / Ethnobotanical Database

Hey everyone.
I recently needed to use Dr. Duke’s Phytochemical database for a project, but the raw CSV dumps from the USDA are an absolute nightmare to parse (missing fields, inconsistent naming, random caps lock everywhere).

I spent the last couple of days completely cleaning, normalizing, and mapping the dataset into a relational JSON structure so it’s actually usable for data science pipelines.

I put a sample of 400 fully mapped chemical/plant entities on GitHub if anyone else needs this for their research. Saved me a ton of headache.
[https://github.com/wirthal1990-tech/USDA-Phytochemical-Database-JSON]

submitted by /u/DoubleReception2962
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *