Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Looking For Help: Does Anyone Happen To Have Access To The Recipe1M+ Dataset?

Greetings everyone,

I’ve been attempting to download the Recipe1M dataset using the following URL (http://im2recipe.csail.mit.edu/dataset/download). Unfortunately, I keep encountering an “Internal Server Error” message.

I’ve reached out to the authors to address this issue, but I haven’t received any response so far.

I’m wondering if anyone here knows why the Recipe1M dataset is no longer available, and if there are any alternate methods to obtain it. Your insights would be greatly appreciated.

Thank you in advance!

submitted by /u/momo_2411
[link] [comments]

More Public SQL-queryable Databases?

Recently I discovered BigQuery public datasets – just over 200 datasets available for directly querying via SQL. I think this is a great thing! I can connect these direct to an analytics platform (we use Apache Superset which uses Python SQLAlchemy under the hood) for example and just start dashboarding.

There are plenty of public datasets around, but I would love to find more that are directly queryable of this kind. I have done some searching, the closest I have found is DoltHub, but it’s not so easy to connect direct with SQLAlchemy. Are there others around?

submitted by /u/8sleef
[link] [comments]

Providers For Lead Data Enrichment Through API?

Hi there, I’m looking for ways to enrich data for our SaaS users that register only with an email address.
We’ve been checking various providers but don’t feel that we found the right one yet.
Based on an email address we would like to get:

Their name Location Title Company

We would like to get this through an API call.

The provider should have data globally (with a focus on EU & North America), regularly refreshed.
Any suggestions?

submitted by /u/kalabunga_1
[link] [comments]

Historical Daily Weather Forecasts For US

Hi Everyone,

I am looking for a historical temperature/raifall/etc. meteorological forecasts. Preferably daily frequency. For example I would like to have something like that (European date format):

1/1/1990 – Temperature forecasts for e.g. two week (until 15.1.1990)

2/1/1990 – Temperature forecasts for e.g. two week (until 16.1.1990)

etc.

It would be great if someone points me to the right direction. I tried NOAA but I couldn’t find historical forecasts.

Thank you in advance!

submitted by /u/Elric4
[link] [comments]

Looking For The Data Of “Testing (quizzing) Boosts Classroom Learning: A Systematic And Meta-analytic Review.”

I know they have the data uploaded somewhere but I can’t find it. Does someone have an idea where to find it? Citation: Yang C, Luo L, Vadillo MA, Yu R, Shanks DR. Testing (quizzing) boosts classroom learning: A systematic and meta-analytic review. Psychol Bull. 2021 Apr;147(4):399-435. doi: 10.1037/bul0000309. Epub 2021 Mar 8. PMID: 33683913.

submitted by /u/bergmolchlover
[link] [comments]

App That Links Heart Rate And/or ECG To Calendar Events/other Apps? ⌚️🫀

App that links heart rate and/or ECG to calendar events/other apps? ⌚️🫀
Hi all.
Perhaps a silly question, but I’ve always wondered if there is an app that links heart rate and/or ECG to calendar events/other apps? I haven’t been able to find one.
So you could, e.g., see what was going on in your life at different points and see how they affected your heart rate and/or ECG. In that sense, to figure out more about oneself that you perhaps wouldn’t be noticing otherwise over long periods.
Thank you in advance.
Tags: Apple Watch, watchOS, iOS, macOS, health data, quantified self, tracking, wearables, self-tracking, lifelogging

submitted by /u/VictoriaSobocki
[link] [comments]

Migrating Data From 1 Clickhouse To Starrocks

if anyone found himself in a similar situation,

i have a db with 300milions in clickhouse db (500go) and my task is to migrate the data to starrocks db and both are using mysql as client

the problem is the schema in clickhouse is just a string representation of json and the second db has 10 tables so i have to process the json and convert its properties to the appropriate table,

my method is export 1million record as csv file ( because its faster than using select sql satetemnt) and im setting a cursor so the next time i’ll pull the next 1mill and process the data using python and send it as put request to starrocks because starrocks expose and endpoint to save files ( this is the fastest way)

the problem is when i reach + 30mil the process of pulling 1mil goes from 1sec to 20min and when reachin +50mil it take like 40min any solution please?

submitted by /u/Youth-Character
[link] [comments]

Anonymized Therapeutic Conversation Datasets For Language Model Development

I am currently building a language model centered around therapeutic conversations, intending to propel advancements in mental health technology.
My project requires access to comprehensive, anonymized datasets of therapeutic dialogue.
I’d like to request any guidance toward potential resources, including but not limited to open-source repositories, academic literature, or online forums. Furthermore, any guidance on best practices when dealing with this type of sensitive data would be greatly appreciated.

submitted by /u/ZealousidealBlock330
[link] [comments]

Google Blocked Me For Using Pyrenees

I run a script using pytrends to extract data from google trends and my server was recently blocked due to too many requests. I’m wondering what the alternatives are? I know there’s paid api out there but looking for free solution or something I can build out myself. Any help is appreciated

submitted by /u/yevo_
[link] [comments]

Seeking Annotated Video Dataset Divided Into Chapters/Topics – Preferably Long Lectures Or TED TALKS

Hello everyone,

I am currently in search of a dataset that consists of annotated videos, ideally divided into chapters or topics. My primary interest lies in long-form content, such as university lectures or educational talks, where each segment of the video is annotated and corresponds to a specific topic or chapter.

I’ve considered using TED Talks, but I’m unsure if I can use them for a paper publication. Also I am not sure they are annotated. If anyone has experience with TED Talks I would appreciate your insight.

Furthermore, if anyone knows of any other resources, datasets, or platforms where I can find this type of annotated video content, please share. My goal is to leverage these annotations for evaluating a method I am working on, and publishing the paper.

Thank you in advance for your assistance!

submitted by /u/OhHiMarkos
[link] [comments]

Datasets For Recommending Music To People And How To Use Them

Hello guys, I’m looking to make a recommender system using a music dataset and I can’t find many of them on the web that could help me. Do you have any suggestions or tips on how to use them?

I want to use a dataset that will enable me to use collaborative filtering. I don’t understand how to put a dataset together from the Million Song Dataset. If anyone would like to help, I’d greatly appreciate it!

submitted by /u/CheapJaguar458
[link] [comments]

Dataset Of MMLU Results Broken Down By Task

I am primarily looking for results of running the MMLU evaluation on modern large language models. I have been able to find some data here https://github.com/EleutherAI/lm-evaluation-harness/tree/master/results and will be asking them if/when, they can provide any additional data.

MMLU may be the most common evaluation run on LLMs recently, but it is very rare for papers to report more than a single final number and I have not been able to find datasets for the evaluations that were run for any major recent LLM papers.

submitted by /u/corey1505
[link] [comments]

Game Analytics Datasets For Gamer Modeling

Does someone have a dataset for game analytics? I will do gamer modeling. So I want datasets that include gamer behavior, like how many coins get they had in one level, the tools utilized, online session duration, and more. Any type of mobile game dataset would suffice, but if it pertains to a hypercasual game, it would be great. I have attempted to search for relevant datasets on Kaggle and google it but have been unable to find any suitable options.

submitted by /u/rai_shi
[link] [comments]

Prices For Used Medium And Heavy Duty Vehicles Data Set?

I’m looking for a data set that aggregates the sell price for used medium and heavy duty vehicles in the U.S.
For example, I’d like like a data set of used box trucks with attributes (selling price, year, mileage).
Sites like commercial truck trader display trucks that are actively being sold but there is no seamless way of aggregating this information into a data set.
I’ve been unable to find a data set that matches these preferences. What are my options?

submitted by /u/freshcarrot7
[link] [comments]