Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Dataset For Companies And Their Respective Categories

I’m trying to build an analyzer of my spending habits and I would like to know what various categories of expenses I have.

For example, I have a csv of all my transactions One transaction might say “Chipotle” and I would like that to be categorized into a restaurants. My approach is to have a dataset of these popular companies and their respective types in order to categorize them into “genres”. I’m currently using OpenStreetMaps: overpass api because they have tags on each company or store classifying what type they are. If anyone has a dataset like this or suggestions for a different approach, please let me know.

TLDR: Looking for a dataset that has companies that people ordinarily buy from and their category “Chipotle: Restaurant” “Nike: fashion” …

submitted by /u/AimBot_4000
[link] [comments]

I Have Made A Queryable MySQL And JSON Dataset From The DSM-V

I have published a FREE MySQL and JSON version of the DSM-V. I am working on developing my own AI-powered semi-private healthcare app, and I am doing it all 100% myself, so if you wish to use my dataset, please consider donating to help me with my own project if you’re willing and able! It would really help me out with the development of my app. If you are willing to donate, please see the readme in the GitHub repo. TYSM in advance.

So anyway, this dataset contains all of the DSM-V disorders, their diagnostic criteria (organized into categories and subcategories, as laid out in the DSM-V), culture and gender-related considerations for diagnosis, prevalence data, recording procedures, and any other information provided about the disorder, conveniently organized and queryable, written in MySQL with a JSON export copy included as well.

Here’s the link! https://github.com/Danm998/DSM-V

This took me a fair bit of work, so please consider donating if it helps you with a project of your own. Thanks in advance, I hope you enjoy!

submitted by /u/Danm998
[link] [comments]

RSpace Data Management Platform Is Now Open Source

RSpace is an all-in-one ELN, sample manager and Research Data Management (RDM) platform that integrates with many other data tools. RSpace is designed to act as a central data hub and pipeline for large academic institutes who want to support open science and FAIR data principles. RSpace already has good open APIs, but to encourage the data community to build even more integrations to allow better flow of data, RSpace is now fully open source. Learn more here: https://github.com/rspace-os

submitted by /u/invasifspecies
[link] [comments]

Buying Customer Parent-Child Relationship Data

We need to build parent/child relationships between customers in our system.

We have about 300,000 customers—many are mid-sized companies, a few are very large, and a number are very small outfits.

About 3% of our customers have “children,” meaning they own other customers in our database. We are unsure how many customers fall into that ‘child’ category, but we estimate it may be around 10% of the total customer population.

We have an enterprise MDM system connected to our CRM, which can help us manage the data.

Our challenge is finding a reliable source for parent-child relationship data. We are a large national company (USA) and we have a reasonable budget to purchase this data, but I am unsure where to start looking. We currently buy some data from D&B, but their parent-child values are unreliable at best, making it difficult to depend on them.

If anyone has suggestions on where we can obtain accurate parent-child relationship data, please share. It would be much appreciated.

submitted by /u/HuronChief
[link] [comments]

Dataset Of Clinical Indicators For Benign Stomach Tumors

Request:
I’m looking for a dataset that contains clinical indicators for various benign stomach tumors. The dataset should include one or more of the following tumor types: gastric polyps, gastrointestinal stromal tumors (GISTs), lipomas, leiomyomas, schwannomas, neurofibromas, pancreatic heterotopia, hemangiomas, lymphangiomas, glomus tumors, fibromas.

metric of what I’d like about:

age, gender,blood cell count, liver function, kidney function, blood lipids, tumor markers and pathological results

I’m not too picky about the format as long as the diagnosis is separate from the clinical indicators and the formatting is consistent.

Artificial datasets are okay, maybe even preferred, as long as they’re accurate.

submitted by /u/TodayAshamed6095
[link] [comments]

Datasets Of Planetary Positions Over The Last Fifty Years.

I am working on a statistical analysis of gravitational effects on small earthly objects. I have been able to determine some correlations that appear to exist relative to the Earth’s axial tilt toward and away from the sun throughout the years in question.

This seems to be supported by tidal effects recorded across the globe. However this does not account for all the deviations I am seeing in the rest of the data, and I would like to confirm or disprove these potential correlations.

Given the number of deviations it seems evident there are other interplanetary dynamics at play. With a bit of digging, I came across John Henry Nelson’s work for RCA on Radio Wave Propagation as influenced by solar storms and coronal mass ejections.

His work found correlations between planetary alignment, solar flares, and CMEs as they relate to radio wave propagation. The academic paper was insightful but lacked the data I would need to use in my work.

I know I could reasonably approximate these details, but most definitely would prefer to simply grab some existing data and get back to number crunching.

Any help would be appreciated. Cheers!

submitted by /u/1-Awesome-Human
[link] [comments]

Looking For Dataset Of Medical Billing Company That’s Doing Covid Billing Or People With Blue Cross Blue Shield Insurance Patients!

Hey everyone, hope I will get some resources/idea from here. I was looking for the dataset of medical belling company that’s doing Covid billing / people with blue cross blue shield insurance patients. I need name, address, number, and ID that starts with XOF for people who have blue cross blue shield insurance. is it possible or you have any idea please lmk!

submitted by /u/Nandhagopalakrishnan
[link] [comments]

Data On Number Of Congregations By U.S. State

Hello! I would really appreciate some help with finding the number of congregations or churches (over all religious establishments) by state. Doing different searches reveals websites that show percentage of population that are different religions and similar info but not how many “churches” there are. I am assuming there has to be some way to find this info since they need to be registered with the state and federal government for tax purposes.

I assume I am just not using the right keywords. If someone could help me learn what the right thing to search is that would be excellent. TIA!

submitted by /u/herosandwixh
[link] [comments]

Developed A Free Platform To Quickly Create Jsonl Datasets For Gpt Finetuning And Customize Llm Call Functions

While I was working on some other projects I created for myself a platform to quickly create jsonl datasets for gpt finetuning and customize llm call functions. I realized it’s quite useful so I might as well just publish the site just in case it could be useful to any of you guys. All the functionalities are client side so you can check easily that I am not trying to steal your datasets :- )

Of course completely free!

https://finetune-gpt.vercel.app/

submitted by /u/Pleasant_Syllabub591
[link] [comments]

Where To Find Sweden Carbon Tax Rate (carbon Price)? I Look At Carbonpricingdashboard.worldbank.org But Sweden Carbon Tax Data Is Empty

Hi everyone, as explained in the title I’m curently looking for Sweden carbon tax rate or Sweden carbon price data from 2010-2019. I already tried using this site, but Sweden carbon tax rate is empty. Tried from this reddit post aswell, but still doesn’t find it. Does anyone please help me, where to find this data? (and if any other, could please share other carbon tax rate data other than the world bank one)

Thank you for your help!

submitted by /u/ILoveRice444
[link] [comments]

What Is The Best Dataset/API For Maps Data?

Hello Data Gurus, I came here and I think I will get the best help

I am currently building an app that tells about streets. I need a large dataset that has information about every single street in the world (Description, length, Hotels, etc etc etc)

Is there any API (It’s fine if paid) you recommend for this purpose?

It doesn’t have to be about streets. just information about places in the whole globe

And thank you for reading my question!

submitted by /u/waelnassaf
[link] [comments]

Automatic Subject Image Cropping Solution

Been working on compiling large datasets of characters derived from anime screencaps for the purpose of training as LoRAs to be used with Stable Diffusion. I’ll typically be working with usually about 10,000 images (and up to 80,000 images in some cases) that I will need to manually crop to focus on the intended character. That said, I do use a simple cosine similarity program to remove near-duplicate images along with WD1.4 tagging to divide images into their own character-specific datasets based on appearance, but I may still have to manually crop upwards of 1,000 images. It’s not impossible, but by no means a valuable use of time when there’s likely a way to significantly reduce the menial work.

I’ve seen some solutions with FiftyOne, but I’ve got no idea how to utilize it myself – are there any publicly available solutions anyone can recommend?

submitted by /u/jnslater
[link] [comments]

Instrumental One-Shot Sample Dataset

Hi.

I am looking for/building a dataset comprised of instrumental one-shots (defined as “individual hits, stabs, or sound bites). If any of you have used digital audio workstations, there are usually some preview one-shot samples that come with the software.

For a personal project, I am looking to utilize this dataset for the artificial generation of similar one-shot samples. Particularly, percussion one-shots would be useful for both training and proof-of-concept.

I am fairly new to personally collecting data, and any tips or advice for finding valuable sources would be much appreciated. There are some good instrumental datasets on Kaggle, but nothing that fits the data I’m looking for. Furthermore, if anyone has the ability to share a dataset matching this description, I would be grateful. Cheers!

submitted by /u/Even_Contribution_32
[link] [comments]

Looking For ECommerces With A Specific Provider

If you were building a SaaS for ecommerce stores but you were only able to integrate with one ecommerce provider (WooCommerce) at the moment.

How would you go about finding ecommerces using that provider so you can reach out to them later?

Right now I’m:

Googling: “buy [CATEGORY] online in [REGION/COUNTRY]”. Entering the first 10-15 stores. Using Free StoreLeads extension to see which provider they use. Create my own database on Sheets one by one.

Any ideas? Can’t afford StoreLeads platform rn.

submitted by /u/fgd2398
[link] [comments]

Need A Relatively Large Student Teacher Dialogue Dataset

Hi, as the title suggests I need a dataset that has recorded the interactions between students and teachers in a learning environment. For context, I’m currently working on a project for a university to develop a custom assistant that interacts with students in a tutor-like way using OpenAI’s API, the data will be used for fine-tuning interactions. Thanks in advance.

submitted by /u/AGMcCarron
[link] [comments]