New Destinations For Mockingbird – FOSS Mock Data Stream Generator

When we launched Mockingbird a few weeks ago, the idea was to make it super simple to generate mock data from a schema that you could stream to any destination. When we launched it, you could send mock data streams to Tinybird and Upstash Kafka.

Now, we’ve added support for Ably, AWS SNS, and Confluent.

You can check out the UI here: https://tbrd.co/mock-rd and it’s also available as a CLI with npm install @tinybirdco/mockingbird-cli

Hope this helps when you can’t find the dataset you need!

submitted by /u/tinybirdco
[link] [comments]

0

All Men Under 55 Who Died On June 19, 2022 In VA

Hello, as the title states, I am looking for all men under 55 in the state of Va, USA who died on that date.

I’ve tried VA newspapers but most are not online.

Any help would be appreciated.

submitted by /u/DerpSherpa
[link] [comments]

0

School Closures Caused By The COVID-19 Pandemic

submitted by /u/cavedave
[link] [comments]

0

[self-promotion] Hosted Embedding Marketplace – Stop Scraping Every New Data Source, Load It As Embeddings On The Fly For Your Large Language Models

We are building a hosted embedding marketplace for builders to augment their leaner open-source LLMs with relevant context. This lets you avoid all the infra for finding, cleaning, and indexing public and third-party datasets, while maintaining the accuracy that comes with larger LLMs.

Will be opening up early access soon, if you have any questions be sure to reach out and ask!

Learn more here

submitted by /u/achyutjoshi
[link] [comments]

0

Anyone Have Access To Statista And Help Out A Poor College Student ? Their Yearly Rate Is Egregious And Need Data For Research Thesis I’m Working On

Need help

submitted by /u/Minute_Marionberry98
[link] [comments]

0

[D] Seeking Guidance On Accessing FMRI Datasets Related To Schizophrenia For AI Development

submitted by /u/cavedave
[link] [comments]

0

Data On Humanitarian Aid In Form Of Aid-Data And ODA-Data

Hey guys, does someone maybe know how I can get a global ODA or Aid-Data dataset. I am curently working on my Bachelors thesis and therefore I would need these Datasets in a form that I could use with Stata.

I tried downloading the ODA-Data and import it to stata but for some reason I didn’t have any obervations. I would need Data which would contain information on the amount of aid delivered to countries in a given year for an analysis of the impact of humanitarian aid on conflict dynamics.

If someone has a tip or could help me, that would be realy nice.

submitted by /u/HighVoltageplay
[link] [comments]

0

Any Carbon Offset Apis/sources Out There? Feel Like Building An Index/tool To Make That Murky World More Transparent

Feel very much like making some tools to index viability/value of carbon offsets (which I am very aware is in parts extremely murky). Are there any good api or data sources out there? Needless to say anything I build I will share, will be free, accessible etc etc :). Thanks in advance!

submitted by /u/wittykitty
[link] [comments]

0

Pass The Pigs Dice Game Data 6000 Observations

Seen in this video they describe the optimal strategy for this math dice game

https://youtu.be/ULhRLGzoXQ0

Datasets on this as it might be a useful simulation project 6000 observations

https://rpruim.github.io/fastR2/reference/Pigs.html

submitted by /u/cavedave
[link] [comments]

0

Language Models Can Explain Neurons In Language Models (including Dataset)

Includes dataset of gpt2 explaining it’s neurons

submitted by /u/cavedave
[link] [comments]

0

Roast My Time-series Dataset Site. This Should Be Fun

Hey so I’m building a real-time data platform (here) and I’ve put a decent amount of time series data on it but it still needs A LOT of work. The goal is to get as much good real time data on there as possible

Feel free to play around with it and download some of the data. Appreciate any feedback – the more critical the better.

submitted by /u/judahcooper
[link] [comments]

0

How To Store 175 Million Rows And Query Them

Hey! I have many Json files that equate to 175 million pieces of data, I’m unsure how to store them in a database, I’ll need to first create one big json file or loop through each or the files and move the data from them into a database.

I’ll need to do querys against the whole dataset multiples times a day so the quicker the better.

I’ve already experimented with mongodb but I just can’t see past the way querys are written.

Any ideas?

submitted by /u/ScottishVigilante
[link] [comments]

0

Looking For Dataset For LLM Tokenization: Need Around 1GB Multi-lingual + Code

I’ve been working on a tokenizer that determines the best possible tokens to represent the test dataset in the least number of tokens for various different vocabulary sizes.

It works well but I’ve been testing with The Pile test data, but it’s mostly English so it’s a not good representation for multi-lingual. It also lacks a fair amount of code and tags.

I need around 1-2GB raw text uncleaned and uncensored, that represents a few different languages and a fair amount of code from different programming languages. Better to be raw, and include data both with HTML tags as it would be when scraped, and also without HTML tags (as it would prioritize the HTML tags too heavily if they were always present).

So just a good representation of general text.

I know I could build my own dataset from various different ones, but it seems to me that a dataset like this should already exist. Any leads would be helpful. Thank you.

submitted by /u/Pan000
[link] [comments]

0

Looking For US Data Set Of Corporate Employee Titles And Job Level

Hi. I’m looking for a data set of employee titles at US companies. Not specific companies and not specific people just generalities. Currently have a record set of a million plus job titles of all varieties from cashier to CEO… However looking to pair these up with job level. Wondering if anything like this exists anywhere. I’ve tried the SAMs database but that’s even messier than the records that I have.

Thanks

submitted by /u/cdtoad
[link] [comments]

0

A Detailed Shaded Relief Map Of London Rendered From Lidar Data [OC]

submitted by /u/cavedave
[link] [comments]

0

In Search Of E-Mail Datasets Like ENRON

Hi everyone, I am PhD student and currently working on an NLP and Network Analysis project. I am in search of an email dataset with sender, receiver and the message information. Preferably from companies with some other metrics such as performance and so on included (which is not absolute necessity). If anyone know of such a dataset like ENRON or SpamAssasian and direct me to it, I would be most thankful.

submitted by /u/Saklehir
[link] [comments]

0

Is There Any Job Datasets For Recent Graduates?

I was wondering if there were any job datasets with statistics about employment rates and types of jobs recent graduates get. The more variables for a data point, the better.

submitted by /u/DetachedOptimist
[link] [comments]

0

Java Has Endured Radical Transformations In The Technology Landscape And Many Threats To Its Prominence.

There are legitimate gripes about Java syntax, to be sure—the same is true of JavaScript and every other language. As Bjarne Stroustrup once said, “There are only two kinds of languages: the ones people complain about and the ones nobody uses.”

The JVM

submitted by /u/Upbeat-Ad-2183
[link] [comments]

0

Need An Easy First Dataset Regarding Financial Data.

I’m taking a programming module in my Economics course and I need an easy csv dataset to analyse as my first one for some coursework.

I have absolutely zero experience with this so a relatively simple one would be very nice thanks!

submitted by /u/As14nn
[link] [comments]

0

Need S&P 500 Market Cap For All Years Since 1980

Working on a side project, but cant seem to find this data. Its weird that this should be obviously out there but is behind pay walls. Is there a free source I can get this data from?

submitted by /u/Ill_Fisherman8352
[link] [comments]

0

Looking For Dollar Store Data With Opening Year

On the hunt for dollar store data from the past 5 years, including opening year. Preferable including Dollar General and Dollar Tree, but either/or is fine. I’m able to find some data through SNAP and BluePages, but they don’t have any information on when the store was opened. Any ideas?

submitted by /u/Rough-Fail-3211
[link] [comments]

0

Dataset For Airline Passengers 2019-2022

Doing a project where we are finding data about airlines. I need a dataset with complex demography of passengers from the years 2019-2022. This primarily focuses on age, gender, and possibly nationality. It has been a pain in the ass to find anything that specific, and I’m guessing it is hard to find because most datasets have limited information, and others may have restrictions on how data can be used. If you do find anything, please comment.

submitted by /u/ShrimpChipCEO
[link] [comments]

0

I Need A Dataset Containing Images Of Ears And Ages Of People On Them

Preferably west pomeranian university one but could be anything. I’m in need of a way to download and access it.

submitted by /u/Prudent_Country4074
[link] [comments]

0

Looking For Suitable Dataset To Predict Forest Firest For My Project

The subject for my project is predicting forest fires and I am looking for a dataset similar to the one shared on Kaggle but I can’t find one. I looked on Earth engine and found some datasets but they don’t provide dates and they are Imagecollections, not csv. I am familiar with machine learning and cleaning datasets in csv format after turning it into dataframes but not at all familiar with Imagecollections. So basically my question comes down to two paths:

I use the datasets from Earth Engine but I don’t know how to work with them. So perhaps someone could give me some tips on how to predict Can someone guide me towards a suitable dataset to predict forest fires?

I appreciate all input!

submitted by /u/Ripplekipple
[link] [comments]

0

Best Tools/techniques For Capturing Workflow Data?

Are there any good tools/techniques for capturing workflow data, specifically to help train an LLM? Use case is accurate question answering around processes/best practices inside an organization.

Is this where something like a UiPath would be necessary?

submitted by /u/Constant-Potato-4712
[link] [comments]

0

Best Books (10k) Multi-Genre Data [self-promotion]

I started on this idea of finding a comprehensive book dataset which for sure has a description and more than one genre (makes things more realistic), since I wanted to cluster them based on similarity to find some good ones to read for myself 😉 The only ones I could find on Kaggle were ones with a single genre label, so collected it on my own.

So sharing it here in case it helps someone else too:

[Dataset](https://www.kaggle.com/datasets/ishikajohari/best-books-10k-multi-genre-data)

The data was collected from Goodreads from their list – Books That Everyone Should Read At Least Once and contains Description, Ratings and Multiple Genre classifiers.

submitted by /u/ishika_jo
[link] [comments]

0

I Have Been Given A Dataset From Which I Have To Extract Tables And Create Charts.

Can anyone assist me?

submitted by /u/Reasonable_Drawer_57
[link] [comments]

0

Free Arrival/departure Aircraft API?

I’m wondering if there is a free aviation API to track arrivals and departures to a set airport. It would collect: Callsign, Aircraft Type, Gate, and Arrival/Departure airport, then plug that into a Google Sheet.
Currently I run this process manually by looking at FlightAware data, but if I can automate this for free that would be great!

submitted by /u/ModeratorOfNothing
[link] [comments]

0

Looking For A Dataset On Electrical Equipment Failure

From the assignment ” Source a data set with regard on equipment failure on the internet that can help you to illustrate the difference between causation and correlation. “

submitted by /u/Strong_Papaya94
[link] [comments]

0

I Need A Database For All The Countries On Earth

I need each country’s population, area (preferably in square miles), GDP, and year of founding. Just raw data.

submitted by /u/PotatoSacGamingYT
[link] [comments]

0

Category: Datatards

New Destinations For Mockingbird – FOSS Mock Data Stream Generator

All Men Under 55 Who Died On June 19, 2022 In VA

School Closures Caused By The COVID-19 Pandemic

[self-promotion] Hosted Embedding Marketplace – Stop Scraping Every New Data Source, Load It As Embeddings On The Fly For Your Large Language Models

Anyone Have Access To Statista And Help Out A Poor College Student ? Their Yearly Rate Is Egregious And Need Data For Research Thesis I’m Working On

[D] Seeking Guidance On Accessing FMRI Datasets Related To Schizophrenia For AI Development

Data On Humanitarian Aid In Form Of Aid-Data And ODA-Data

Any Carbon Offset Apis/sources Out There? Feel Like Building An Index/tool To Make That Murky World More Transparent

Pass The Pigs Dice Game Data 6000 Observations

Language Models Can Explain Neurons In Language Models (including Dataset)

Roast My Time-series Dataset Site. This Should Be Fun

How To Store 175 Million Rows And Query Them

Looking For Dataset For LLM Tokenization: Need Around 1GB Multi-lingual + Code

Looking For US Data Set Of Corporate Employee Titles And Job Level

A Detailed Shaded Relief Map Of London Rendered From Lidar Data [OC]

In Search Of E-Mail Datasets Like ENRON

Is There Any Job Datasets For Recent Graduates?

Java Has Endured Radical Transformations In The Technology Landscape And Many Threats To Its Prominence.

Need An Easy First Dataset Regarding Financial Data.

Need S&P 500 Market Cap For All Years Since 1980

Looking For Dollar Store Data With Opening Year

Dataset For Airline Passengers 2019-2022

I Need A Dataset Containing Images Of Ears And Ages Of People On Them

Looking For Suitable Dataset To Predict Forest Firest For My Project

Best Tools/techniques For Capturing Workflow Data?

Best Books (10k) Multi-Genre Data [self-promotion]

I Have Been Given A Dataset From Which I Have To Extract Tables And Create Charts.

Free Arrival/departure Aircraft API?

Looking For A Dataset On Electrical Equipment Failure

I Need A Database For All The Countries On Earth

Recent Posts

Recent Comments

18+ Content

Recent Posts

Recent Comments