What counts as “retirement” can be loosely interpreted, but I’m looking for a dataset that marks year someone “retired” or started collecting retirement within the US.
submitted by /u/data_questions
[link] [comments]
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
What counts as “retirement” can be loosely interpreted, but I’m looking for a dataset that marks year someone “retired” or started collecting retirement within the US.
submitted by /u/data_questions
[link] [comments]
Looking for some type of data, anything really, on commercial real estate retrofitting. Ideally what types of buildings are getting retrofit, how many, what types of systems are primarily being upgraded, etc.
Thanks!
submitted by /u/Caconym32
[link] [comments]
I need a zillow dataset of rentals, along with all their details, for a research project. I know zillow is very possessive of their data, but it needs not be current – is there a way to get a dataset of old rental listings from somewhere?
Alternatively, is there a different dataset that I could use that would provide a similar level of details on rentals? I know there are probably a lot of sources where I could get a footage, bedrooms/bathrooms and a price, but zillow provides data such as laundry machine/drier unit availability, pet policy and pet rent, etc. Are there any datasets like that available?
Thank you in advance
submitted by /u/SofisticatiousRattus
[link] [comments]
A dataset with campaign headlines, description and content of the campaign
submitted by /u/nothrishaant
[link] [comments]
The NIST Ballistics Toolmark Research Database (NBTRD) is an open-access research database of bullet and cartridge case toolmark data. The development of the database is sponsored by the U.S. Department of Justice’s National Institute of Justice. The database is being developed to:
foster the development and validation of measurement methods, algorithms, metrics, and quantitative confidence limits for objective firearm identification
improve the scientific knowledge base on the similarity of marks from different firearms and the variability of marks from the same firearm, and ease the transition to the application of three-dimensional surface topography data in firearms identification.
The database contains traditional reflectance microscopy images and three-dimensional surface topography data acquired by NIST or submitted by database users. The goal is a collection of data sets that:
-represents the large variety of ballistic toolmarks encountered by forensic examiners, and
-represents challenging identification scenarios, such as those posed by consecutively manufactured firearm components.
submitted by /u/lurklord_
[link] [comments]
Are there any other options? Trying to build portfolio to get data analyst or data science positions.
submitted by /u/Nickaroo321
[link] [comments]
i am attempting to create a rice variety classification. regarding the training data i can do a cluster of rice that will fill up the whole image or just 1 rice grain per picture.
if individual rice grains: what about the background of the rice grain what will i use?… or is data augmentation enough in making it less reliant on a specific background?
submitted by /u/MadCrownie
[link] [comments]
submitted by /u/marr4231
[link] [comments]
I am trying to create some software to translate text to ASL and vice versa, and I cannot find a source or API for the life of me. I planned to scrape handspeak.com, but the terms of use prohibit the downloading of content. Does anyone know where I could find this data?
submitted by /u/FrosteeSwurl
[link] [comments]
Are there any datasets about unconscious biases against those with a hearing loss?
submitted by /u/WisdomMultiplier
[link] [comments]
Hey! Sorry if this is the wrong sub!
I’m doing a project for school and I just need a dataset that has individualized demographic data (as in each row refers to a different person and describes as many demographic traits as possible such as race, income, education etc). I don’t know why but it’s been impossible to find individualized data rather than aggregate data at the census tract level or something like that.
Does anyone have any recommendations on where to look or how to search for this? I don’t really care about the specifics of the data like what region it’s in or anything
submitted by /u/moose_on_a_hus
[link] [comments]
I’ve been working on this database for about a year during my sabbatical and released a preview version of it this week: https://baseball.computer/
I have two goals for the project – to facilitate reproducible baseball research and to create the most fun and interesting “toy dataset” possible for educational settings.
From a technical standpoint, the database runs entirely inside of your browser, which means that you can write SQL against event-level data and visualize the results directly on the website. The tables are all available to download as flat files, and there are instructions for connecting to the data in Python and R.
From a baseball standpoint, it contains thousands of individual columns that pre-calculate as many building blocks as possible for statistical analysis. These include:
Repeatable construction of WAR components like linear weights, win/run expectancy, and park factors An example of a Keras deep-and-cross deep learning model that can train using the entire dataset on a laptop Tables that correctly merge event-level, box-level, game-level, and season-level raw data Taxonomies and additional metadata for outcome types, batted balls, and pitches 100+ event-level atomic “counting stats” including granular information on traditional stats, baserunning advances, pitches, and batted-ball location/trajectory. Detailed event state tables that can be combined with the counting stats for calculating splits Inference/deduction for handling missing batted ball data, unknown fielders, and unusual scorekeeper tendencies
Extensive-but-spotty documentation is available for all tables on the site. This includes all of the source (SQL) code, the upstream and downstream dependencies of each table, and a link to directly download the table as a flat file (here is an example). There are also several hundred tests and data constraints. This is nowhere near enough coverage to guarantee ease of use or data integrity, but it will hopefully serve as a foundation for both as the project evolves.
A couple of requests for anyone interested in playing around with it – please send me any feedback (bugs, feature requests, use cases, etc.) and, if you find it interesting, please share with your other data communities!
submitted by /u/PaginatedSalmon
[link] [comments]
I’m looking for a dataset that has screen recording videos (either videos or video compressions) and (ideally) accompanying descriptions of the actions completed in the video (e.g. user adds a table to a Word document). The descriptions are optional, but the dataset must contain videos. This will be used to train a video-captioning model.
Does anyone know where I can download this kind of dataset?
submitted by /u/danh3
[link] [comments]
Source: https://eerscmap.usgs.gov/uspvdb/data/
The United States Large-Scale Solar Photovoltaic Database (USPVDB) provides the locations and array boundaries of U.S. ground-mounted photovoltaic (PV) facilities with capacity of 1 megawatt or more. Large-scale facility data are collected and compiled from various public and private sources, digitized and position-verified from aerial imagery, and quality checked. The USPVDB is available for download in a variety of tabular and geospatial file formats to meet a range of user/software needs. Cached and dynamic web services are available for users that wish to access the USPVDB as a Representational State Transfer Services (RESTful) web service.
submitted by /u/n1nja5h03s
[link] [comments]
I’m seeking real estate agent email data for when offers come into a realtors email.
submitted by /u/No-Exam5695
[link] [comments]
Hi all!
For the past few months, after uploading this post in r/PushShift, I had a chance to have quite a lot of discussions with academic researchers with this. I soon noticed that sharing historical database often goes against universities’ IRB (and definitely the new Reddit’s t&c), so that project had to be shutdown. But based on the discussions, I worked on a new tool that adheres strictly to Reddit’s terms and conditions, and also maintaining alignment with the majority of Institutional Review Board (IRB) standards.
The tool is called RedditHarbor and it is designed specifically for researchers with limited coding backgrounds. While PRAW offers flexibility for advanced users, most researchers simply want to gather Reddit data without headaches. RedditHarbor handles all the underlying work needed to streamline this process. After the initial setup, RedditHarbor collects data through intuitive commands rather than dealing with complex clients.
Here’s what RedditHarbor does: – Connects directly to Reddit API and downloads submissions, comments, user profiles etc. – Stores everything in a Supabase database that you control – Handles pagination for large datasets with millions of rows – Customizable and configurable collection from subreddits – Exports the database to CSV/JSON formats for analysis
Why I think it could be helpful to other researchers: – No coding needed for the data collection after initial setup. (I tried maximizing simplicity for researchers without coding expertise.) – While it does not give you an access for entire historical data (like PushShift or Academic Torrents), it complies with most IRBs. By using approved Reddit API credentials tied to a user account, the data collection meets guidelines for most institutional research boards. This ensures legitimacy and transparency. – Fully open source Python library built using best practices – Deduplication checks before saving data – Custom database tables adjusted for reddit metadata
Please check it out and let me know your thoughts! I would love to hear any feedbacks and feature requests 🙂
Actively maintained and adding new features (i.e collect submissions by keywords)
submitted by /u/nickshoh
[link] [comments]
I’m looking for GeoJSON of the world’s NAVAREAs and subregions but I can’t seem to find them anywhere. I can find pictures of them (like the one below) but that’s not really what I need.
I would have thought IHO would have had something like this but it’s not on their website, they can’t be messaged on Twitter, and they seem to be unable to make any sort of commitment that there might even be set of internationally recognized areas of responsibility without having their lawyers present to advise them.
NAVAREA boundaries are created only for information purpose and it does not constitute an endorsement or approval of them and the IHO does not vouch for the validity or accuracy of these boundaries.
submitted by /u/hrokrin
[link] [comments]
I’m looking to find a mapping of all the series IDs that are available in the Fred API. For example cpi would look like this Categories > Prices > Consumer Price Indexes (CP| and PCE)
submitted by /u/aksinchupidkeshuns
[link] [comments]
Looking for interesting graph datasets that show interesting relationships between entities, time information attached to links would also be cool. I’m trying to make some cool visualizations
submitted by /u/epoch_ai
[link] [comments]
Hi everyone.
I am looking for Litrec dataset containing data on 1927 users who rated 3710 literary works. The dataset is used in some papers related to book recommendation systems. Here are the most notable ones:
H Alharthi et al., 2019, Study of linguistic features incorporated in a literary book recommender system H Alharthi et al., 2018, Authorship Identification for Literary Book Recommendations P. C Vaz et al., 2012, Improving a hybrid literary book recommendation system through author ranking
I have spent hours googling for the dataset, yet I have not found it. I would really appreciate it if you could help me access the dataset.
I really appreciate any help you can provide.
submitted by /u/bob_j79
[link] [comments]
I’m building a web page to provide company-specific contact information and the steps to take to close or transfer accounts after someone dies. Trying to figure out the best way to identify companies to request info from. Thanks! https://www.buriedinwork.com/company-contacts
submitted by /u/apzuckerman
[link] [comments]
Hi guys, I’m researching customer behavior in Vietnam and would like to have access to historical
anonymous mobile location data to find insights into customers’ favorite locations. Is there any free dataset that I could use to achieve this? Or I can buy it if it is less than $100 (sorry not much, because I’m still in college). Thank you.
submitted by /u/Thanh-Do
[link] [comments]
I’m currently knee-deep in my thesis research exploring the factors influencing e-commerce growth, and I’m hitting a roadblock that I hope some of you might be able to help with.
I’ve got data for my independent variables—things like mobile phone penetration rate, urbanization, and education levels across populations in China, India, the United States, and Europe. However, when it comes to the crucial dependent variable of e-commerce growth, it’s proving to be quite the challenge.
I’m specifically looking for monthly or quarterly data, and my school insists on a substantial timeframe (2010 to 2020). The trouble is, finding this kind of data for all four regions is like finding a needle in a haystack, especially when comparing provincial data for China against the other regions.
If anyone has suggestions for alternative dependent variables or knows of sources for monthly/quarterly e-commerce growth data (even if it’s just for China), I’d be eternally grateful. My thesis is almost wrapped up, focusing on why China’s e-commerce growth stands out, but this data hiccup is causing a bit of a headache.
Thanks in advance for any insights or leads you can provide!
submitted by /u/trippie30
[link] [comments]
I’m currently knee-deep in my thesis research exploring the factors influencing e-commerce growth, and I’m hitting a roadblock that I hope some of you might be able to help with.
I’ve got data for my independent variables—things like mobile phone penetration rate, urbanization, and education levels across populations in China, India, the United States, and Europe. However, when it comes to the crucial dependent variable of e-commerce growth, it’s proving to be quite the challenge.
I’m specifically looking for monthly or quarterly data, and my school insists on a substantial timeframe (2010 to 2020). The trouble is, finding this kind of data for all four regions is like finding a needle in a haystack, especially when comparing provincial data for China against the other regions.
If anyone has suggestions for alternative dependent variables or knows of sources for monthly/quarterly e-commerce growth data (even if it’s just for China), I’d be eternally grateful. My thesis is almost wrapped up, focusing on why China’s e-commerce growth stands out, but this data hiccup is causing a bit of a headache.
Thanks in advance for any insights or leads you can provide!
submitted by /u/trippie30
[link] [comments]
I’ve been trying to find some API that can allow me to get information on upcoming flights such as origin, destination, number of stops and prices. But so far I’ve come across none that are usable. There were two major ones that I thought might work: Skyscanner and Google Flights, but Skyscanner only allows for commercial use and google flights api doesn’t exist somehow… Not sure where to go from here.. I’m thinking of building my own api by scrapping but that is extremely in-efficient and sounds like a dumb idea…
submitted by /u/Competitive-Adagio18
[link] [comments]
Hi! I need data set for UN’s 16th sustainable goal which is “peace, justice, and strong institutions”. I know there are open source data sets available but all of these datasets only have 1 to 2 variables at max such as homicides number by age/sex. I need a data set where I can run multiple linear regression, make 3D scatterplot, scatter plot matrix, heat map.
All of these require multiple numeric data.
submitted by /u/rantings-of-troubled
[link] [comments]