Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Is There A Dataset For EU/UK Flight Delay Reasons?

Under EU/UK legislation, consumers are eligible for compensation if their flights are delayed or cancelled due to reasons within a carrier’s control. This would rule out natural disasters, for example, but include reasons such as ‘an air steward was ill’.

Passengers are able to claim compensation based on the length of the delay and distance being travelled, and there’s some excellent documentation on the subject here:

https://www.citizensadvice.org.uk/consumer/holiday-cancellations-and-compensation/if-your-flights-delayed-or-cancelled/

The process for claiming compensation is convoluted and has spawned a mini industry of copycat legal firms who’ll do the heavy lifting on behalf of customers (for a fee).

Many of these firms provide free online tools (e.g. this one) for checking the validity of a claim. Whilst it’s trivial to check the status of any given flight (e.g. delayed by x minutes, distance, destinations, etc.), determining the airline’s provided reason for a delay is less obvious.

Is anyone familiar with an API or dataset that might provide this data? I’ve found a provider for US domestic flights (https://www.bts.gov/explore-topics-and-geography/topics/airline-time-performance-and-causes-flight-delays) but nothing for those operating within Europe.

Any pointers would be greatly appreciated.

submitted by /u/trilson
[link] [comments]

ISO Datasets About Antibiotic Resistant Bacteria In UK Waterways

Title pretty much covers it. I’m looking for datasets on antibiotic resistant bacteria in UK waterways for a personal/portfolio project (not affiliated with any company, I am a Data Analytics student with some background in biology)

I’m especially interested in looking at the river Thames and the impact of antibiotics filtering into the environment through wastewater treatment plant “effluent”. Alternatively, hospital effluent would be really interesting to look at too!

Most of the data I’ve found has been a (thin) patchwork of time periods and areas covered and it’s been hard to find anything I can use to tell a story. Any help would be hugely appreciated. Thank you, r/datasets!

submitted by /u/Medium-Tea-
[link] [comments]

Food Recipe Dataset For My Personal Project

For context, I’m looking for a large food recipe datset (>5000) with nutritional information for my second personal project as a data analyst.

The goal is to identify recipes and the list of ingredients for it with the following input parameters: The amount of nutrients Dietary requirements Type of cuisine Etc.

In terms of the data source, any excel public dataset or getting it using Post API request is fine.

Thanks in advance.

submitted by /u/xu3n12
[link] [comments]

What Advantages Or Disadvantages Does Synthetic Data Have Over Real-world Data?

Need to understand the perks to pivot to a synthetic data generator and whether it has a market. I work in a data annotation company by the name of Acme AI and a key bottleneck of clients is a scarcity of data (in many cases) for training ML models. Naturally, this led me to question the existence of said novel ML solution if data is scarce in the first place (i.e. no market value). Seeking responses with practical examples or experiences.

submitted by /u/SithisR
[link] [comments]

Who Knows How This Dataset Was Labeled?

I’m trying to find an efficient way to reproduce a csv labeling similar to this Shakespeare one:

“Dataline”,”Play”,”PlayerLinenumber”,”ActSceneLine”,”Player”,”PlayerLine”

“1”,”Henry IV”,,,,”ACT I”

“2”,”Henry IV”,,,,”SCENE I. London. The palace.”

“3”,”Henry IV”,,,,”Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WESTMORELAND, SIR WALTER BLUNT, and others”

“4”,”Henry IV”,”1″,”1.1.1″,”KING HENRY IV”,”So shaken as we are, so wan with care,”

“5”,”Henry IV”,”1″,”1.1.2″,”KING HENRY IV”,”Find we a time for frighted peace to pant,”

“6”,”Henry IV”,”1″,”1.1.3″,”KING HENRY IV”,”And breathe short-winded accents of new broils”

“7”,”Henry IV”,”1″,”1.1.4″,”KING HENRY IV”,”To be commenced in strands afar remote.”

“8”,”Henry IV”,”1″,”1.1.5″,”KING HENRY IV”,”No more the thirsty entrance of this soil”

“9”,”Henry IV”,”1″,”1.1.6″,”KING HENRY IV”,”Shall daub her lips with her own children’s blood,”

“10”,”Henry IV”,”1″,”1.1.7″,”KING HENRY IV”,”Nor more shall trenching war channel her fields,”

“11”,”Henry IV”,”1″,”1.1.8″,”KING HENRY IV”,”Nor bruise her flowerets with the armed hoofs”

“12”,”Henry IV”,”1″,”1.1.9″,”KING HENRY IV”,”Of hostile paces: those opposed eyes,”

“13”,”Henry IV”,”1″,”1.1.10″,”KING HENRY IV”,”Which, like the meteors of a troubled heaven,”

Here’s the complete dataset:https://paste.c-net.org/WidelyTibetan

The ,”Play”,”PlayerLinenumber”,”ActSceneLine”,”Player”, columns is what I’d need to know how to reproduce, ideally fully automated or semi automated.

Anyone know how this was done and the way to reproduce it efficiently?

Thanks for your suggestions!

submitted by /u/Efficient_Fix1026
[link] [comments]

US County Level Housing Unit Estimates Csv 1960-present?

Does anyone know of where I can find csvs for US couny level Housing Unit totals from each decadal census 1960-present? The 1990-2020 ones are available from the census website, but the 1960-1980 ones are not. 1960 and 1970 are on the census website, but in non readable PDF form, so that would be a lot of work to digitize. 1980 I just straight up cannot find the right link. I found some data on ISPSR, but they are at the census tract level, and I am not confident if I aggregate I will get the true county estimate from the census since before 1990 census tracts were only defined for limited areas. If anyone knows of a place this has already been digitized I would really appreciate it. Sorry if this isn’t the best place to post, I’ll probably cross post.

submitted by /u/estheticpotato
[link] [comments]

Need Alternative To Parsehub. I Need Only To Run One Project

Hi,

I have build a project in parsehub to scrape some data, and when I test run it works great. But its impossible to extract all the data through test run, as PC freezes (consumes too much RAM).

I have spoken to parsehub, they tried IP rotating on their end, but website is blocking, so they are suggesting to use a proxy to do that.

However, cheapest parsehub subscription + purchasing a proxy costs are too much to handle.

Is there any other free tool, that could work on this case?

Sorry if this is a wrong place to ask for this.

Thanks.

submitted by /u/ConsistentPromise156
[link] [comments]

Is There A Way To Seek Comprehensive And Crucial Data For A City Chitradurga Analysis

Is there a way to Seek Comprehensive and Crucial Data for Chitradurga Analysis

I’m in the process of conducting an extensive analysis focused on Chitradurga, and I’m on a quest for crucial and all-encompassing data. I’m interested in gathering information that covers a wide spectrum of topics, ranging from village land prices, land utilization trends, and the transition from agriculture to industry, to intricate details like population trends, literacy rates, employment statistics (covering both the employed and the unemployed), and any noteworthy initiatives related to sustainable growth and renewable energy. My specific interest lies in comprehending the status and developments in wind power generation.

The purpose behind this endeavor is to construct a meticulous and complete overview of Chitradurga’s growth trajectory. Your participation could be instrumental in this undertaking. If you’re privy to essential data sources, databases, or possess first-hand insights on village land prices or any other pertinent facets, I would be extremely appreciative if you could share your knowledge.

By collaborating on this effort, we can contribute to a more profound understanding of Chitradurga’s journey and the multitude of factors that shape its progression. Thank you for considering participation in this endeavor; your contributions will undoubtedly elevate the quality and depth of this research.

Best regards.

submitted by /u/Sure_Ad8210
[link] [comments]