I found databases of short jokes like dad jokes, question jokes, etc. Kaggle has them. But I can’t find jokes that are paragraphs long. Does anyone know where I can get them?
submitted by /u/abdullahmnsr2
[link] [comments]
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
I found databases of short jokes like dad jokes, question jokes, etc. Kaggle has them. But I can’t find jokes that are paragraphs long. Does anyone know where I can get them?
submitted by /u/abdullahmnsr2
[link] [comments]
I’m working on a beer database project and need clean images of beer bottles. Does anyone know of any websites or places where I can find these? URLs or the actual db where all are stored.
submitted by /u/Terrible_Band6290
[link] [comments]
Hi I am now doing the project associated with use the representation learning for the large scale dynamic network. And my work now is based on the reddit data set. I am trying to find the data set which include the time series stamps, making the user as the node of the graph, I can build the edge for the different nodes.
I have tried some source, but these data did not meet my requirement in some places:
https://snap.stanford.edu/graphsage/ ( the node is not user)
https://github.com/dingidng/reddit-dataset ( can only build very few edges)
And I am confused how can I get the access to use the reddit data set with the pushshift, since I was rejected by the bot for many times. And how can I use the data set in the pushshift platform. If anyone can help me find the reliable and useful data source, thank you so much!
Thank you in advance for any help and suggestions.
submitted by /u/Terrible_Band6290
[link] [comments]
I’ve been looking for a while, so where could I find a dataset for the combination of two words, e.g. water + fire -> steam, or cauldron + rabbit’s foot -> mystical potion for a more fictional one. I want a dataset similar to the one infinite craft uses.
submitted by /u/Any-Palpitation-7876
[link] [comments]
I am looking for access to the redd dataset. The link – redd.csail.mit.edu has been dead for a few months. How do we download it now? The archive [page](https://web.archive.org/web/20220812015008/http://redd.csail.mit.edu/) prevents me from downloading it because it requires password. Could i pass a username/password as a query parameter/cookie and download it? If so, what are the credentials? Are there other alternatives, or ideas for how to acquire it?
submitted by /u/DuckHunterZx
[link] [comments]
The webpage for MIMIC-III shows that the full zip file for download is 6.2GB. In particular chartevents.csv.gz file is listed as 4.0GB. Download process in the browse shows 6.2 GB to be downloaded, but it is very-very slow.
The webpage also gives a wget command to download on command line, and this command says the total data size is 4.2 GB. In its download, the chartevents.csv.gz file is 2.3 GB. BTW, this method is about 8 times faster than the browser-based download.
Would appreciated insight into this difference. Has anyone encountered this before?
submitted by /u/Far-Cantaloupe4144
[link] [comments]
I was thinking to create a gig related to statistical analysis (SPSS and R studio) on Fiverr. While keyword research, I noticed that there either no or vey small number of pending orders in to gigs ranked on first page. Why is that? is it out of trend?
submitted by /u/Sheeraz_Ahmed
[link] [comments]
Hi all,
I recently started a project where I’d need to collect the following data:
the population of various countries across the world
-cost of electric per use in said country
-total hotels in x country
total grocery stores in x country
-average hotel size (sq ft)
-average grocery store size (sq ft)
As a college Freshman this is my first research project and would like to know what steps/ sources would be most useful to collect this data. My first instinct is to just do google searches but I don’t know if there is a data base of method more professional.
submitted by /u/dumbbitch44
[link] [comments]
I’m wondering if there are advanced metrics for specific golf courses available somewhere online.
For example, if I wanted to know what percentage of time PGA golfers, during tournament play, hit into the fairway bunker on the 1st hole of Augusta, where could I go to find that info? These types of stats have to exist somewhere, right?
Shotlink appears to be a relatively new technology that keeps this data, but I haven’t had any luck finding access to their database. Datagolf.com has a lot of good info, but it’s player-based, not course-based. Appreciate any help!
submitted by /u/mmckeever23
[link] [comments]
Does anyone know of a dataset that allows you to look up vehicle characteristics (most interested in MSRP) by VIN, preferably free to use? Willing to pay if it is a high quality dataset.
submitted by /u/my_profesh_acct
[link] [comments]
Hey everyone,
I’m working on a beer database project and need clean images of beer bottles. Does anyone know of any websites or places where I can find these? URLs or the actual db where all are stored.
I’ve been struggling to find good sources. Any help would be greatly appreciated. Thanks!
submitted by /u/Candid_Muscle_4654
[link] [comments]
Hello, I’m not sure if this is the right place, but I urgently need access to the MIMIC-III database for my thesis. I thought access was free, but it turns out you have to pay for a course. I’m a postgraduate student facing financial difficulties, so I wanted to ask if anyone knows how to access that database for free. Please.
submitted by /u/captain_77destroyer
[link] [comments]
Hi everyone! I am trying to look for datasets that have data regarding sunlight exposure for countries all around the world.
It would definitely be more helpful if I can find a dataset that has both the sunlight exposure and temperature.
Thanks in Advance!
submitted by /u/Exotic-Comment-297
[link] [comments]
I’m trying to find a dataset for every single candidate contesting from any form of election (federal, state, local) in the US for the past 3-4 years for a political video ad classification project. Does anyone have any idea if this sort of database exists or how I can compile it? Just need name, party, state, and contested office.
submitted by /u/moul1k
[link] [comments]
Heya, I need a dataset containing a bunch of different of phonetic letter pronunciations for a thing I’m doing, does anyone have anything or a dataset I could use? I cant really find anything. All I’m finding is sentences and words.
submitted by /u/Charming-Pop-2017
[link] [comments]
I have a collegue investigation project about “keys for the success of the restaurants”. I need any dataset about the status of the best restaurants or the hygiene status of a group. We going to write an article and i take part in getting data for analize and start an introduction. I have knowledge about data science but not a good group of data. Can someone help me?
submitted by /u/Its_santi_27
[link] [comments]
Looking for a data set on romance fraud and cat fishing, specifically with measure regarding outcomes of victimization (mental health, physical health, etc.)
submitted by /u/skelechel
[link] [comments]
I have a collegue investigation project about “keys for the success of the restaurants”. I need any dataset about the status of the best restaurants or the hygiene status of a group. We going to write an article and i take part in getting data for analize and start an introduction. I have knowledge about data science but not a good group of data. Can someone help me?
submitted by /u/Its_santi_27
[link] [comments]
I’m trying to build an analyzer of my spending habits and I would like to know what various categories of expenses I have.
For example, I have a csv of all my transactions One transaction might say “Chipotle” and I would like that to be categorized into a restaurants. My approach is to have a dataset of these popular companies and their respective types in order to categorize them into “genres”. I’m currently using OpenStreetMaps: overpass api because they have tags on each company or store classifying what type they are. If anyone has a dataset like this or suggestions for a different approach, please let me know.
TLDR: Looking for a dataset that has companies that people ordinarily buy from and their category “Chipotle: Restaurant” “Nike: fashion” …
submitted by /u/AimBot_4000
[link] [comments]
I have published a FREE MySQL and JSON version of the DSM-V. I am working on developing my own AI-powered semi-private healthcare app, and I am doing it all 100% myself, so if you wish to use my dataset, please consider donating to help me with my own project if you’re willing and able! It would really help me out with the development of my app. If you are willing to donate, please see the readme in the GitHub repo. TYSM in advance.
So anyway, this dataset contains all of the DSM-V disorders, their diagnostic criteria (organized into categories and subcategories, as laid out in the DSM-V), culture and gender-related considerations for diagnosis, prevalence data, recording procedures, and any other information provided about the disorder, conveniently organized and queryable, written in MySQL with a JSON export copy included as well.
Here’s the link! https://github.com/Danm998/DSM-V
This took me a fair bit of work, so please consider donating if it helps you with a project of your own. Thanks in advance, I hope you enjoy!
submitted by /u/Danm998
[link] [comments]
I am working on an application that works with form 1004s Uniform Residential Appraisals, the ones with Fanie Mae’s format. I am in need of filled sample forms. Any help will be greatly appreciated.
submitted by /u/AdPatient9267
[link] [comments]
Is there a dataset of registered telegram usernames? Or maybe a way to gather them?
submitted by /u/Namtiee
[link] [comments]
I’m looking for a dataset that includes private fund (HFs, private equity/credit funds) data drawn from publicly accessible sources (SEC, 990 etc).
Ideally this dataset would include location, ownerships, principal contact, assets raised, and relationship (who funded whom) data.
Any suggestions?b
submitted by /u/timearbitrage
[link] [comments]
RSpace is an all-in-one ELN, sample manager and Research Data Management (RDM) platform that integrates with many other data tools. RSpace is designed to act as a central data hub and pipeline for large academic institutes who want to support open science and FAIR data principles. RSpace already has good open APIs, but to encourage the data community to build even more integrations to allow better flow of data, RSpace is now fully open source. Learn more here: https://github.com/rspace-os
submitted by /u/invasifspecies
[link] [comments]
We need to build parent/child relationships between customers in our system.
We have about 300,000 customers—many are mid-sized companies, a few are very large, and a number are very small outfits.
About 3% of our customers have “children,” meaning they own other customers in our database. We are unsure how many customers fall into that ‘child’ category, but we estimate it may be around 10% of the total customer population.
We have an enterprise MDM system connected to our CRM, which can help us manage the data.
Our challenge is finding a reliable source for parent-child relationship data. We are a large national company (USA) and we have a reasonable budget to purchase this data, but I am unsure where to start looking. We currently buy some data from D&B, but their parent-child values are unreliable at best, making it difficult to depend on them.
If anyone has suggestions on where we can obtain accurate parent-child relationship data, please share. It would be much appreciated.
submitted by /u/HuronChief
[link] [comments]
Hello,
I’m doing some transformers analysis on beige book reports.
However, I can’t figure out how to access through fred, or otherwise.
I’ve found several repos that scrape the data, which is fine if i must, but is there an API source for beige books?
submitted by /u/TableConnect_Market
[link] [comments]
Request:
I’m looking for a dataset that contains clinical indicators for various benign stomach tumors. The dataset should include one or more of the following tumor types: gastric polyps, gastrointestinal stromal tumors (GISTs), lipomas, leiomyomas, schwannomas, neurofibromas, pancreatic heterotopia, hemangiomas, lymphangiomas, glomus tumors, fibromas.
metric of what I’d like about:
age, gender,blood cell count, liver function, kidney function, blood lipids, tumor markers and pathological results
I’m not too picky about the format as long as the diagnosis is separate from the clinical indicators and the formatting is consistent.
Artificial datasets are okay, maybe even preferred, as long as they’re accurate.
submitted by /u/TodayAshamed6095
[link] [comments]