Category: Other Nonsense & Spam

Looking For Data On Chinese Solar PV Subsidies

Hi all,

I’m a college student working on an econometric research project trying to determine the effect of Chinese government subsidies on solar PV manufacturing share. I’m having trouble finding data on

the $ or yuan amount of subsidy available for Chinese solar PV manufacturing each year Chinese solar PV manufacturing revenue each year

If anyone can recommend how I can go about finding this data, I would really appreciate the help. I do have access to several paid/subscription data sources through my university. Thank you!

submitted by /u/evacuatethepremises
[link] [comments]

How To Find A Great Data Set? How To Nail A Data Project?

So my Stats class requires a data project as a final project( which is about 40% worth, so I’ll have to nail it to get an A in the class). I’ve been looking for data sets but I can’t find much and nothing that jolts my strings of interests. I’m wondering if anyone has suggestions of where I could find data sets and what type of data would be cool to analyze. Also, I’ll highly appreciate any advice on how to do an exceptional data project:)

submitted by /u/Ancient_Ad_5430
[link] [comments]

Find All Utility And Public Works Buildings For Three States?

Finding all utility and public works addresses in three states?

How might I go about finding the locations above? Is there a big data set out there? I attempted using open street map with big query. I can’t say if I did the query correctly. Additionally tried using a place query with ESRI geocoder city by city for each of the states but that was a disaster. I have 6 years of GIS experience and am semi proficient in python and other coding langauges.

submitted by /u/Different_Camp4002
[link] [comments]

WebScraping Specific Zip Code Data From Zillow

Hello, I have a data science project I’m interested in doing. I want to web-scrape housing data from the Zillow website within a 15-mile radius of a potential career location. I don’t have much experience in web scraping but, I know I need to use selenium (an automated browser) and python’s beautiful soup library to execute this part of my project. Does anyone have experience in web scraping Zillow’s website specifically? Any advice or Youtube videos to help me get started?

P.S. I was informed to check to see if Zillow has an API. I checked and it looks like the best I’ll be able to get from an API is using RapidAPI: 40 records of data per GET request with a one-month limit of 20 GET REquest (800 records).

submitted by /u/juangui37
[link] [comments]

CleanVision: Audit Your Image Datasets For Better Computer Vision

To all my computer vision friends working on real-world applications with messy image data, I just open-sourced a Python library you may find useful!

CleanVision audits any image dataset to automatically detect common issues such as images that are blurry, under/over-exposed, oddly sized, or near duplicates of others. It’s just 3 lines of code to discover what issues lurk in your data before you dive into modeling, and CleanVision can be used for any image dataset — regardless of whether your task is image generation, classification, segmentation, object detection, etc.

from cleanvision.imagelab import Imagelab imagelab = Imagelab(data_path=”path_to_dataset”) imagelab.find_issues() imagelab.report()

As leaders like Andrew Ng and OpenAI have lately repeated: models can only be as good as the data they are trained on. Before diving into modeling, quickly run your images through CleanVision to make sure they are ok — it’s super easy!

Github: https://github.com/cleanlab/cleanvision

Disclaimer: I am affiliated with Cleanlab.

submitted by /u/jonas__m
[link] [comments]

Scrape Thousands Of Records Of Housing Data Using Python [Self-Promotion]

Hey r/datasets,

I originally posted this library earlier this week, but it got downvoted once within 10 minutes and was never heard from again. And I get it, this is a place for posting/requesting datasets.

So, here’s an actual dataset of CA housing data I generated using the RedfinScraper library. Scraping these 47,000 records took just over 3 minutes.

While this data may be useful today, the fact is, it will only be useful for about a week longer. The high-velocity nature of housing data means that datasets need to be updated frequently.

This issue was the driving force for sharing this library publically: to allow users to quickly scrape the latest housing data at their leisure.

I hope you find this library useful, and I am excited to see what you create with it.

submitted by /u/ryan_s007
[link] [comments]

How Features With A Strong Correlation Should Be Treated?

Hey guys,

I am with some difficult to clear my dataset , originally I had 80 features, after apply some ML rules, for example remove a feature that just have null values etc, I am now with 67 features.

I decided to apply correlation, and I have many features with a strong correlation +0.9 or – 0.9 , I saw that I can remove features with a strong correlation.

But I could not find if there is a rule, or which of this feature should I remove.

For example if I have feature A,B,C,D , and A x B and A x C has a strong correlation should I remove A ? or B and C ?

If someone could kind give some help or some documents about It I will be more than glad.

Thank you.

submitted by /u/No_Bee_9081
[link] [comments]

Nucreal Fusion Dataset, Is It Too Experimental/secret?

so, im almost done with my data science course. final project, which is supposed to take a month, will be due may 4th.

i was wondering, where does an average joe like me, get his hands on some nuclear fusion datasets? i have no clue what id be doing with it.. but i think nuclear fusion is fascinating, and if i can do something with it, why not.

ive tried google, kaggle and huggingface, couldnt find much.

i know everything is in development right now. its cutting edge technology. pushing the boundary of our knowledge.. and now im wondering, would those datasets be considered top secret?

well, anyway. thanks for reading and any of the help you could provide

submitted by /u/RngdZ
[link] [comments]

Magic: The Gathering Deck Lists Scraped From MtgTop8

Magic: the Gathering deck dataset

I scraped deck lists from a competitive deck sharing platform called MtgTop8 for a project I’m working on.

Decks are separated by format in the following:

– standard

– modern

– pioneer

– historic

– explorer

– pauper

– legacy

– vintage

They’re stored as Apache feather files which can be easily converted to either pickle or csv files.

Feel free to use them for whatever purpose.

Here’s the link

submitted by /u/ArmyOfCorgis
[link] [comments]

GIS Data For A Project. I Apologize For The Banality Of My Request And For My English.

Hi all, I’m new to the community and also new to the world of data.

In a postgraduate course they assigned me an exercise on the QGIS software by representing a specific data model on a map, the goal is to make us practice and the topic is free.

Where can I get open data suitable for QGIS? I apologize for the banality of my request and for my English.

Thank you all 🥲

submitted by /u/Scarraf1
[link] [comments]