Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Open Sourcing A Data Science Analytics Platform To Analyze Any Dataset

Question to the dataset builders: Would you like to use a user-friendly data science analytics platform if we open-source it? Lyzr is to data analysts and business users what Streamlit is to data scientists and ML engineers.

We’re on the verge of launching an open-source version of our new insights platform, www.lyzr.ai, explicitly crafted with the analyst community in mind, and we’d be honored if you could test it and share your invaluable feedback. It may currently seem like a mere GPT wrapper, but trust us, countless hours and dedication have gone into making this more than just that.

Why did we create it?

There is just 1 data scientist for every 100 data analysts (as per GCP data analytics head). We envision a world where data analysts and business users have the tools to dabble more in to data science. Our platform also aims to simplify the 0-75th percentile of descriptive statistics for data scientists, allowing them to concentrate on building more complicated data science models.

The cherry on top? We’re gearing towards an open-source launch. We believe in the power of collective genius and want everyone to benefit from what we’ve built and further enhance it collaboratively.
Please let me know if you are interested in giving it a spin. Will DM the link.

And let us know what you think! What features resonate with you? What’s missing? Would you use it if open-sourced?

Your feedback will not only be appreciated, but it’ll also be instrumental in shaping the future of this platform.

Thank you and looking forward to your insights!

submitted by /u/sivasurendira
[link] [comments]

Open Sourcing A Data Science Analytics Platform To Analyze Any Dataset

Question to the dataset builders: Would you like to use a user-friendly data science analytics platform if we open-source it? Lyzr is to data analysts and business users what Streamlit is to data scientists and ML engineers.
We’re on the verge of launching an open-source version of our new insights platform, www.lyzr.ai, explicitly crafted with the analyst community in mind, and we’d be honored if you could test it and share your invaluable feedback. It may currently seem like a mere GPT wrapper, but trust us, countless hours and dedication have gone into making this more than just that.
Why did we create it?
There is just 1 data scientist for every 100 data analysts (as per GCP data analytics head). We envision a world where data analysts and business users have the tools to dabble more in to data science. Our platform also aims to simplify the 0-75th percentile of descriptive statistics for data scientists, allowing them to concentrate on building more complicated data science models.
The cherry on top? We’re gearing towards an open-source launch. We believe in the power of collective genius and want everyone to benefit from what we’ve built and further enhance it collaboratively.

Please let me know if you are interested in giving it a spin. Will DM the link.
And let us know what you think! What features resonate with you? What’s missing? Would you use it if open-sourced?
Your feedback will not only be appreciated, but it’ll also be instrumental in shaping the future of this platform.
Thank you and looking forward to your insights!

submitted by /u/sivasurendira
[link] [comments]

[self-promotion] New Data On Snowflake Marketplace: Cybersyn Recently Expanded A Number Of Our Free Public Datasets. Access The New Data From The Below Links Directly In Your Snowflake Instance:

The full text of SEC 8-K filings and exhibits + 10-K and 10-Q exhibits added to Cybersyn SEC Filings. Example topics covered: company press and earnings releases, merger agreements, subsidiaries, and material changes in financial conditions. 100+ time series added to Cybersyn Financial & Economic Essentials. Example topics covered: labor force participation, disposable income, employee earnings by industry, and housing starts. Text-based US government contracts data added to Cybersyn Government Essentials. Example use cases: search for high value government contracts awarded to specific businesses, identify federal agencies with the greatest contractor spend, find gov’t contracts for ESG-related opportunity, train and fine tune LLMs Geospatial data in GeoJSON and WKT formats added to Cybersyn Government Essentials, US Addresses & Geographic Areas, and US Housing & Real Estate Essentials

submitted by /u/aiatco2
[link] [comments]

Helper: AWS CloudFront Edge Locations (manually Curated)

This is useful when analyzing CloudFront logs, a way to map the `x-edge-location` code to a place in real world, for traffic analysis:

More information (not mine): https://www.feitsui.com/en/article/3

While there are some places that contains this data, they all seems to me missing some details or have bad information, so I did a spread sheet with every detail:

https://docs.google.com/spreadsheets/d/1QX_qjiieBXIyozvznKSaNPVs6nj6FXGbxA25tS90e6Q/edit#gid=0

submitted by /u/Capyvara
[link] [comments]

Zimbabwe 2018 Election Results Analysis

Hello everyone,

I wanted to bring your attention to the upcoming elections in Zimbabwe scheduled for this Wednesday. The past election raised significant concerns due to allegations of unfairness, including claims of collusion between the electoral commission and the ruling party to manipulate results using Excel files, an issue that has been dubbed “Excelgate.”

Taking a closer look at the available data on the official website, I’ve stumbled upon some noteworthy findings. These findings have prompted me to write an article on LinkedIn, where I explore how they tie into the broader ‘Excelgate’ narrative. Additionally, I delve into the steps citizens have been taking to ensure the integrity of their votes during the upcoming election.

For those who are interested, you can read the article and share your perspectives. I’m always open to hearing different viewpoints and engaging in constructive discussions. Here’s the link to the article and analysis:Article | Analysis

Looking forward to your insights and feedback. Thank you!

submitted by /u/BigIntroduction4586
[link] [comments]

Data On Number Pages In Papers Over The Years

For a while now I’ve trying to prove a perception of mine (and other folks too, I’m sure): scientific papers are getting much longer. I have the (strong) impression that papers now tend to have much more pages than years ago. If anyone knows of such a dataset with, say, titles of papers published by a journal during some years and then, attached to every paper, information like the number of pages.

I’d love to find data about STEM journals, but I’ll take any data that’s available.

Thanks.

submitted by /u/MasonBo_90
[link] [comments]

Is There A Dataset For EU/UK Flight Delay Reasons?

Under EU/UK legislation, consumers are eligible for compensation if their flights are delayed or cancelled due to reasons within a carrier’s control. This would rule out natural disasters, for example, but include reasons such as ‘an air steward was ill’.

Passengers are able to claim compensation based on the length of the delay and distance being travelled, and there’s some excellent documentation on the subject here:

https://www.citizensadvice.org.uk/consumer/holiday-cancellations-and-compensation/if-your-flights-delayed-or-cancelled/

The process for claiming compensation is convoluted and has spawned a mini industry of copycat legal firms who’ll do the heavy lifting on behalf of customers (for a fee).

Many of these firms provide free online tools (e.g. this one) for checking the validity of a claim. Whilst it’s trivial to check the status of any given flight (e.g. delayed by x minutes, distance, destinations, etc.), determining the airline’s provided reason for a delay is less obvious.

Is anyone familiar with an API or dataset that might provide this data? I’ve found a provider for US domestic flights (https://www.bts.gov/explore-topics-and-geography/topics/airline-time-performance-and-causes-flight-delays) but nothing for those operating within Europe.

Any pointers would be greatly appreciated.

submitted by /u/trilson
[link] [comments]

ISO Datasets About Antibiotic Resistant Bacteria In UK Waterways

Title pretty much covers it. I’m looking for datasets on antibiotic resistant bacteria in UK waterways for a personal/portfolio project (not affiliated with any company, I am a Data Analytics student with some background in biology)

I’m especially interested in looking at the river Thames and the impact of antibiotics filtering into the environment through wastewater treatment plant “effluent”. Alternatively, hospital effluent would be really interesting to look at too!

Most of the data I’ve found has been a (thin) patchwork of time periods and areas covered and it’s been hard to find anything I can use to tell a story. Any help would be hugely appreciated. Thank you, r/datasets!

submitted by /u/Medium-Tea-
[link] [comments]

Food Recipe Dataset For My Personal Project

For context, I’m looking for a large food recipe datset (>5000) with nutritional information for my second personal project as a data analyst.

The goal is to identify recipes and the list of ingredients for it with the following input parameters: The amount of nutrients Dietary requirements Type of cuisine Etc.

In terms of the data source, any excel public dataset or getting it using Post API request is fine.

Thanks in advance.

submitted by /u/xu3n12
[link] [comments]