Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Is There Really No Dataset Of All Historical Events?

is there really no dataset of all historical events?

i tried wikidata, but most historical events aren’t listed as such. Filtering by the ‘point in time’ property indeed shows historical events, but also every concert, soccer game, wife carrying contest.

wikipedia has the data, however getting chatGPT to convert the web scraped data to a neat event,date,description format is a lot of tedious work.

I also want to open source my code along with the datasets, so a permissible license is needed. Any dataset for my use case?

submitted by /u/auronic_mortist
[link] [comments]

Quora Question Answer Pairs Dataset – 56,400 Records

Recently I scraped 56,400 question/answer pairs off Quora, and put the dataset on the HuggingFace hub. I plan to continually add to the dataset, but proxy costs are pretty expensive since Quora is hella bloated.

The dataset can be accessed through the HuggingFace profile linked in my article, if anyone is interested : https://www.toughdata.net/blog/post/finetune-flan-t5-question-answer-quora-dataset

submitted by /u/jankybiz
[link] [comments]

[Request] I Am Looking For A Sample Or Actual Dataset For A Budgeting Application About User’s Financial Transactions And Spending

We are developing a budgeting application and are looking for a sample dataset to run tests and find KPI’s , the dataset that we need has to contain information that the user will enter, it can be in any format, the dataset should include the user’s salary/allowance, the frequency at which they get it, their mandatory expenses (Rent/Bills/EMI’s) and miscellaneous expenses. There are no constraints about the type of dataset, contents as long as it has information that is slightly relevent to the above statement, any sort of help is greatly appreciated !!

submitted by /u/lonelypotato42069
[link] [comments]

Looking For Annual County-Level Demographic Data

I am looking for annual data on some basic demographics, mainly median age and education along with racial makeup, at the U.S. county level. I tried the individual data from CPS using IPUMS, but it doesn’t have coverage of every county. Anybody know where this exists? I feel like it has to be out there and I’m missing something obvious.

submitted by /u/mgwil24
[link] [comments]

[self-promotion] Subset Quick Calcs Make Analyzing Data 10x Faster!

Hi everyone! I’ve been working on a data tool that makes it faster to do common analysis off of CSVs. The app is called Subset and it looks like a spreadsheet on a whiteboard.We just launched a feature called Quick Calcs with the goal of making data analysis on existing datasets way faster. For example remove duplicates from a column, sum up everything in that column, and put it in a new grid linked to the original one in under 10 clicks.Here’s an example of me taking a CSV I got from a credit card statement and summarizing my spend by category in a few clicks. My favorite part about the way we’ve built the app is that the results still use formulas and you can trace back to the original input! Here’s a link to a file with some example data if you want to play around with it.Another thing is that because it’s on a whiteboard, you can make a piece of analysis, move it out of the way and do another. You can even compare the results next to one another without switching between tabs.Would love to have this community try it out and provide any feedback 🙂

submitted by /u/Mexpotato
[link] [comments]

[request] Where Can I Find Temperature And Weather Data For Particular Regions In A Specific Format?

I am a student doing a project about simulating conditions in different climates.

Does anyone know where I can find data about temperature in a given year for a few regions around the world, ideally in a csv where each column is hourly data and each row is a day if that makes sense. If I could also have data about humidity and light intensity that would be ideal. I need this for a few regions around the world, doesn’t matter where really so long as they are all geographically far apart, ideally at least one in each continent.

submitted by /u/W4RP3D_
[link] [comments]

Open Sourcing A Data Science Analytics Platform To Analyze Any Dataset

Question to the dataset builders: Would you like to use a user-friendly data science analytics platform if we open-source it? Lyzr is to data analysts and business users what Streamlit is to data scientists and ML engineers.
We’re on the verge of launching an open-source version of our new insights platform, www.lyzr.ai, explicitly crafted with the analyst community in mind, and we’d be honored if you could test it and share your invaluable feedback. It may currently seem like a mere GPT wrapper, but trust us, countless hours and dedication have gone into making this more than just that.
Why did we create it?
There is just 1 data scientist for every 100 data analysts (as per GCP data analytics head). We envision a world where data analysts and business users have the tools to dabble more in to data science. Our platform also aims to simplify the 0-75th percentile of descriptive statistics for data scientists, allowing them to concentrate on building more complicated data science models.
The cherry on top? We’re gearing towards an open-source launch. We believe in the power of collective genius and want everyone to benefit from what we’ve built and further enhance it collaboratively.

Please let me know if you are interested in giving it a spin. Will DM the link.
And let us know what you think! What features resonate with you? What’s missing? Would you use it if open-sourced?
Your feedback will not only be appreciated, but it’ll also be instrumental in shaping the future of this platform.
Thank you and looking forward to your insights!

submitted by /u/sivasurendira
[link] [comments]

Open Sourcing A Data Science Analytics Platform To Analyze Any Dataset

Question to the dataset builders: Would you like to use a user-friendly data science analytics platform if we open-source it? Lyzr is to data analysts and business users what Streamlit is to data scientists and ML engineers.

We’re on the verge of launching an open-source version of our new insights platform, www.lyzr.ai, explicitly crafted with the analyst community in mind, and we’d be honored if you could test it and share your invaluable feedback. It may currently seem like a mere GPT wrapper, but trust us, countless hours and dedication have gone into making this more than just that.

Why did we create it?

There is just 1 data scientist for every 100 data analysts (as per GCP data analytics head). We envision a world where data analysts and business users have the tools to dabble more in to data science. Our platform also aims to simplify the 0-75th percentile of descriptive statistics for data scientists, allowing them to concentrate on building more complicated data science models.

The cherry on top? We’re gearing towards an open-source launch. We believe in the power of collective genius and want everyone to benefit from what we’ve built and further enhance it collaboratively.
Please let me know if you are interested in giving it a spin. Will DM the link.

And let us know what you think! What features resonate with you? What’s missing? Would you use it if open-sourced?

Your feedback will not only be appreciated, but it’ll also be instrumental in shaping the future of this platform.

Thank you and looking forward to your insights!

submitted by /u/sivasurendira
[link] [comments]

[self-promotion] New Data On Snowflake Marketplace: Cybersyn Recently Expanded A Number Of Our Free Public Datasets. Access The New Data From The Below Links Directly In Your Snowflake Instance:

The full text of SEC 8-K filings and exhibits + 10-K and 10-Q exhibits added to Cybersyn SEC Filings. Example topics covered: company press and earnings releases, merger agreements, subsidiaries, and material changes in financial conditions. 100+ time series added to Cybersyn Financial & Economic Essentials. Example topics covered: labor force participation, disposable income, employee earnings by industry, and housing starts. Text-based US government contracts data added to Cybersyn Government Essentials. Example use cases: search for high value government contracts awarded to specific businesses, identify federal agencies with the greatest contractor spend, find gov’t contracts for ESG-related opportunity, train and fine tune LLMs Geospatial data in GeoJSON and WKT formats added to Cybersyn Government Essentials, US Addresses & Geographic Areas, and US Housing & Real Estate Essentials

submitted by /u/aiatco2
[link] [comments]

Helper: AWS CloudFront Edge Locations (manually Curated)

This is useful when analyzing CloudFront logs, a way to map the `x-edge-location` code to a place in real world, for traffic analysis:

More information (not mine): https://www.feitsui.com/en/article/3

While there are some places that contains this data, they all seems to me missing some details or have bad information, so I did a spread sheet with every detail:

https://docs.google.com/spreadsheets/d/1QX_qjiieBXIyozvznKSaNPVs6nj6FXGbxA25tS90e6Q/edit#gid=0

submitted by /u/Capyvara
[link] [comments]

Zimbabwe 2018 Election Results Analysis

Hello everyone,

I wanted to bring your attention to the upcoming elections in Zimbabwe scheduled for this Wednesday. The past election raised significant concerns due to allegations of unfairness, including claims of collusion between the electoral commission and the ruling party to manipulate results using Excel files, an issue that has been dubbed “Excelgate.”

Taking a closer look at the available data on the official website, I’ve stumbled upon some noteworthy findings. These findings have prompted me to write an article on LinkedIn, where I explore how they tie into the broader ‘Excelgate’ narrative. Additionally, I delve into the steps citizens have been taking to ensure the integrity of their votes during the upcoming election.

For those who are interested, you can read the article and share your perspectives. I’m always open to hearing different viewpoints and engaging in constructive discussions. Here’s the link to the article and analysis:Article | Analysis

Looking forward to your insights and feedback. Thank you!

submitted by /u/BigIntroduction4586
[link] [comments]

Data On Number Pages In Papers Over The Years

For a while now I’ve trying to prove a perception of mine (and other folks too, I’m sure): scientific papers are getting much longer. I have the (strong) impression that papers now tend to have much more pages than years ago. If anyone knows of such a dataset with, say, titles of papers published by a journal during some years and then, attached to every paper, information like the number of pages.

I’d love to find data about STEM journals, but I’ll take any data that’s available.

Thanks.

submitted by /u/MasonBo_90
[link] [comments]

Is There A Dataset For EU/UK Flight Delay Reasons?

Under EU/UK legislation, consumers are eligible for compensation if their flights are delayed or cancelled due to reasons within a carrier’s control. This would rule out natural disasters, for example, but include reasons such as ‘an air steward was ill’.

Passengers are able to claim compensation based on the length of the delay and distance being travelled, and there’s some excellent documentation on the subject here:

https://www.citizensadvice.org.uk/consumer/holiday-cancellations-and-compensation/if-your-flights-delayed-or-cancelled/

The process for claiming compensation is convoluted and has spawned a mini industry of copycat legal firms who’ll do the heavy lifting on behalf of customers (for a fee).

Many of these firms provide free online tools (e.g. this one) for checking the validity of a claim. Whilst it’s trivial to check the status of any given flight (e.g. delayed by x minutes, distance, destinations, etc.), determining the airline’s provided reason for a delay is less obvious.

Is anyone familiar with an API or dataset that might provide this data? I’ve found a provider for US domestic flights (https://www.bts.gov/explore-topics-and-geography/topics/airline-time-performance-and-causes-flight-delays) but nothing for those operating within Europe.

Any pointers would be greatly appreciated.

submitted by /u/trilson
[link] [comments]