Publish Data Snapshots As Versioned Datasets On The Hugging Face Hub

We just added a Hugging Face Datasets integration to fenic

You can now publish any fenic snapshot as a versioned, shareable dataset on the Hub and read it directly using hf:// URLs.

Example

“`python

Read a CSV file from a public dataset

df = session.read.csv(“hf://datasets/datasets-examples/doc-formats-csv-1/data.csv”)

Read Parquet files using glob patterns

df = session.read.parquet(“hf://datasets/cais/mmlu/astronomy/*.parquet”)

Read from a specific dataset revision

df = session.read.parquet(“hf://datasets/datasets-examples/doc-formats-csv-1@~parquet/*/.parquet”) “` This makes it easy to version and share agent contexts, evaluation data, or any reproducible dataset across environments.

Docs: https://huggingface.co/docs/hub/datasets-fenic Repo: https://github.com/typedef-ai/fenic

submitted by /u/cpardl
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *