QUEENS: Python ETL + API For Making Energy Datasets Machine Readable

Hi all.

I’ve open-sourced QUEENS (QUEryable ENergy National Statistics), a Python toolchain for converting official statistics released as multi-sheet Excel files into a tidy, queryable dataset with a small REST API.

  • What it is: an ETL + API in one package. It ingests spreadsheets, normalizes headers/notes, reshapes to long format, writes to SQLite (RAW → PROD with versioning), and exposes a FastAPI for filtered queries. Exports to CSV/Parquet/XLSX are included.
  • Who it’s for: anyone who works with national/sectoral statistics that come as “human-first” Excel (multiple sheets, awkward headers, footnotes, year-on-columns, etc.).
  • Batteries included: it ships with an adapter for the UK’s DUKES (the official annual energy statistics compendium), but the design is collection-agnostic. You can point it at other national statistics by editing a few JSON configs and simple Excel “mapping templates” (no code changes required for many cases).

Key features

  • Robust Excel parsing (multi-sheet, inferred headers, optional transpose, note-tag removal).
  • Schema validation & type coercion; duplicate checks.
  • SQLite with versioning (RAW → staged PROD).
  • API: /data/{collection} and /metadata/{collection} with typed filters (eq, neq, lt, lte, gt, gte, like) and cursor pagination.
  • CLI & library: queens ingest, queens stage, queens export, or use import queens as q.

Install and CLI usage

pip install queens # ingest selected tables queens ingest dukes --table 1.1 --table 6.1 # ingest all tables in dukes queens ingest dukes # stage a snapshot of the data queens stage dukes --as-of-date 2025-08-24 # launch the API service on localhost queens serve 

Why this might help r/datasets

  • Many official stats are published as Excel meant for people, not machines. QUEENS gives you a repeatable path to clean, typed, long-format data and a tiny API you can point tools at.
  • The approach generalizes beyond UK energy: the parsing/mapping layer is configurable, so you can adapt it to other national statistics that share the “Excel + multi-sheet + odd headers” pattern.

Links

License: MIT
Happy to answer questions or help sketch an adapter for another dataset/collection.

submitted by /u/KaleidoscopeNo6551
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *