I am looking for company data including category, URL and postal address. All I found was data including contact persons for lead generation (which is quite expensive).
submitted by /u/fromtherivertothesee
[link] [comments]
I am looking for company data including category, URL and postal address. All I found was data including contact persons for lead generation (which is quite expensive).
submitted by /u/fromtherivertothesee
[link] [comments]
Hey all,
For a market research, I’m looking for the monthly Historic Market caps of sp500 companies since 2000. Is there a tool or a dataset available with this data?
SP500
Market caps of SP500 companies
Frequency: Monthly
Period: 2000-2022
submitted by /u/gijsm
[link] [comments]
Hello everyone! I recently developed a service that gets data of iOS App store. Do check it out- https://rapidapi.com/shake-chillies-shake-chillies-default/api/ios-store
I am looking for feedback regarding what data points shall I further include and how useful this is. Thanks!
submitted by /u/Capable_Atmosphere_7
[link] [comments]
Where can I find data related to downloads (historical trend) and updates of certain apps. Open to scraping but the app stores only list approximate download numbers, so using Internet archive to see app store pages won’t work.
submitted by /u/mithi26
[link] [comments]
Looking for a dataset from any country or globally of how products were HS Code classified for customs.
submitted by /u/mrmanicou
[link] [comments]
Hi all,
I’m a college student working on an econometric research project trying to determine the effect of Chinese government subsidies on solar PV manufacturing share. I’m having trouble finding data on
the $ or yuan amount of subsidy available for Chinese solar PV manufacturing each year Chinese solar PV manufacturing revenue each year
If anyone can recommend how I can go about finding this data, I would really appreciate the help. I do have access to several paid/subscription data sources through my university. Thank you!
submitted by /u/evacuatethepremises
[link] [comments]
I’ve nothing to do with this. I just thought it looked cool
submitted by /u/cavedave
[link] [comments]
So my Stats class requires a data project as a final project( which is about 40% worth, so I’ll have to nail it to get an A in the class). I’ve been looking for data sets but I can’t find much and nothing that jolts my strings of interests. I’m wondering if anyone has suggestions of where I could find data sets and what type of data would be cool to analyze. Also, I’ll highly appreciate any advice on how to do an exceptional data project:)
submitted by /u/Ancient_Ad_5430
[link] [comments]
I want to try a project to classify sudoku puzzles as machine-generated only or puzzles that were created by humans or at least vetted by humans.
All the sudoku sets I’ve found thus far seem to be only machine-generated. Solutions are not needed, and any format would be okay.
submitted by /u/onzie9
[link] [comments]
Hi!
I’ve been having a lot of trouble finding a research paper that has used an ANOVA test to conduct a study. I would really prefer that they have their dataset which can be used by SPSS. Any help? Thank you so much!
submitted by /u/Back-Opposite
[link] [comments]
Finding all utility and public works addresses in three states?
How might I go about finding the locations above? Is there a big data set out there? I attempted using open street map with big query. I can’t say if I did the query correctly. Additionally tried using a place query with ESRI geocoder city by city for each of the states but that was a disaster. I have 6 years of GIS experience and am semi proficient in python and other coding langauges.
submitted by /u/Different_Camp4002
[link] [comments]
Hello, I have a data science project I’m interested in doing. I want to web-scrape housing data from the Zillow website within a 15-mile radius of a potential career location. I don’t have much experience in web scraping but, I know I need to use selenium (an automated browser) and python’s beautiful soup library to execute this part of my project. Does anyone have experience in web scraping Zillow’s website specifically? Any advice or Youtube videos to help me get started?
P.S. I was informed to check to see if Zillow has an API. I checked and it looks like the best I’ll be able to get from an API is using RapidAPI: 40 records of data per GET request with a one-month limit of 20 GET REquest (800 records).
submitted by /u/juangui37
[link] [comments]
To all my computer vision friends working on real-world applications with messy image data, I just open-sourced a Python library you may find useful!
CleanVision audits any image dataset to automatically detect common issues such as images that are blurry, under/over-exposed, oddly sized, or near duplicates of others. It’s just 3 lines of code to discover what issues lurk in your data before you dive into modeling, and CleanVision can be used for any image dataset — regardless of whether your task is image generation, classification, segmentation, object detection, etc.
from cleanvision.imagelab import Imagelab imagelab = Imagelab(data_path=”path_to_dataset”) imagelab.find_issues() imagelab.report()
As leaders like Andrew Ng and OpenAI have lately repeated: models can only be as good as the data they are trained on. Before diving into modeling, quickly run your images through CleanVision to make sure they are ok — it’s super easy!
Github: https://github.com/cleanlab/cleanvision
Disclaimer: I am affiliated with Cleanlab.
submitted by /u/jonas__m
[link] [comments]
Is there a way to identify the earliest timestamp of news? For example, when Silicon Valley Bank got into trouble, many news websites reported it. I need to find the earliest report of it.
submitted by /u/BOBOLIU
[link] [comments]
Hey r/datasets,
I originally posted this library earlier this week, but it got downvoted once within 10 minutes and was never heard from again. And I get it, this is a place for posting/requesting datasets.
So, here’s an actual dataset of CA housing data I generated using the RedfinScraper library. Scraping these 47,000 records took just over 3 minutes.
While this data may be useful today, the fact is, it will only be useful for about a week longer. The high-velocity nature of housing data means that datasets need to be updated frequently.
This issue was the driving force for sharing this library publically: to allow users to quickly scrape the latest housing data at their leisure.
I hope you find this library useful, and I am excited to see what you create with it.
submitted by /u/ryan_s007
[link] [comments]
I tried using Common-crawl, but it seems to be full of HTML only. It’s hard to extract the CSS.
submitted by /u/mindgitrwx
[link] [comments]
Hi,
I’m looking for a geojson file which has the polygons for all UK Cities (or just the major ones) for use creating a choropleth/heat map. Any help would be greatfully appreciated!
Thanks in advance!
submitted by /u/shef_japes
[link] [comments]
Hey guys,
I am with some difficult to clear my dataset , originally I had 80 features, after apply some ML rules, for example remove a feature that just have null values etc, I am now with 67 features.
I decided to apply correlation, and I have many features with a strong correlation +0.9 or – 0.9 , I saw that I can remove features with a strong correlation.
But I could not find if there is a rule, or which of this feature should I remove.
For example if I have feature A,B,C,D , and A x B and A x C has a strong correlation should I remove A ? or B and C ?
If someone could kind give some help or some documents about It I will be more than glad.
Thank you.
submitted by /u/No_Bee_9081
[link] [comments]
Hi team,
I am trying to train an ML model to predict social media posts’ engagement metrics based on the copy. To pull it off, I am looking for a training dataset with the following structure:
[Post text] – [Number of Like/Reactions] AND/OR [Number of Retweets] AND/OR [ Number of Comments]
Any leads (paid or free) will be appreciated. Thanks!
submitted by /u/sarimhaq
[link] [comments]
RedfinScraper is a scalable Python library that leverages Redfin’s unofficial Stringray API to quickly scrape thousands of housing records.
I built this library to automate the task of collecting housing data, and to do it at a break-neck speed.
Let me know what cool uses you find for the data!
submitted by /u/ryan_s007
[link] [comments]
so, im almost done with my data science course. final project, which is supposed to take a month, will be due may 4th.
i was wondering, where does an average joe like me, get his hands on some nuclear fusion datasets? i have no clue what id be doing with it.. but i think nuclear fusion is fascinating, and if i can do something with it, why not.
ive tried google, kaggle and huggingface, couldnt find much.
i know everything is in development right now. its cutting edge technology. pushing the boundary of our knowledge.. and now im wondering, would those datasets be considered top secret?
well, anyway. thanks for reading and any of the help you could provide
submitted by /u/RngdZ
[link] [comments]
I scraped deck lists from a competitive deck sharing platform called MtgTop8 for a project I’m working on.
Decks are separated by format in the following:
– standard
– modern
– pioneer
– historic
– explorer
– pauper
– legacy
– vintage
They’re stored as Apache feather files which can be easily converted to either pickle or csv files.
Feel free to use them for whatever purpose.
submitted by /u/ArmyOfCorgis
[link] [comments]