I Put Together A Dataset That Might Be Useful For Researchers

I’ve been working on a side project and ended up compiling a dataset that may be useful beyond what I originally needed it for, so I’m considering releasing it publicly.

At a high level, the dataset contains:

  • structured records collected over a multi-year period
  • consistent timestamps and identifiers
  • minimal preprocessing (basic cleaning + deduplication only)

It’s not tied to a specific paper or product, more something that could support exploratory analysis, modeling, or benchmarking, depending on the use case.

Before publishing, I wanted to sanity-check with this community:

  • what details do you usually look for to judge dataset quality?
  • is light preprocessing preferred, or raw + processed versions?
  • anything that would immediately make this more usable for research?

Happy to share more specifics if there’s interest, and open to feedback before release.

submitted by /u/crowpng
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *