{"id":40858,"date":"2026-05-07T17:27:08","date_gmt":"2026-05-07T15:27:08","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/usda-phytochemical-database-enriched-structurally-validated-json-parquet\/"},"modified":"2026-05-07T17:27:08","modified_gmt":"2026-05-07T15:27:08","slug":"usda-phytochemical-database-enriched-structurally-validated-json-parquet","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/usda-phytochemical-database-enriched-structurally-validated-json-parquet\/","title":{"rendered":"USDA Phytochemical Database &#8211; Enriched &amp; Structurally Validated (JSON\/Parquet)"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>The original Dr. Duke database is a veritable treasure trove of plant compounds, but it remains completely untapped. It cannot be easily integrated into modern machine learning pipelines.<\/p>\n<p>My partner and I have spent the last few weeks manually cleaning and structurally validating 76,907 records from it. We assigned them PubChem CIDs, verified the SMILES descriptions, and added bioactivity values from ChEMBL v35. We also built a query bridge to 1.55 million PubMed abstracts. The core dataset itself is now a strictly typed flat file.<\/p>\n<p>I have uploaded a public 400-row sample with all 16 columns to GitHub and Zenodo so you can test the schema in Pandas or DuckDB.<\/p>\n<p>GitHub: <a href=\"http:\/\/github.com\/wirthal1990-tech\/USDA-Phytochemical-Database-JSON\">github.com\/wirthal1990-tech\/USDA-Phytochemical-Database-JSON<\/a><\/p>\n<p>Zenodo DOI: 10.5281\/zenodo.19660107<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/DoubleReception2962\"> \/u\/DoubleReception2962 <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1t6c65o\/usda_phytochemical_database_enriched_structurally\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1t6c65o\/usda_phytochemical_database_enriched_structurally\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-40858 jlk' href='javascript:void(0)' data-task='like' data-post_id='40858' data-nonce='65e0e39b87' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-40858 lc'>+1<\/span><\/a><\/div><\/div> <div class='status-40858 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>The original Dr. Duke database is a veritable treasure trove of plant compounds, but it remains completely&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-40858","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/40858","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=40858"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/40858\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=40858"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=40858"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=40858"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}