Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

RSpace Data Management Platform Is Now Open Source

RSpace is an all-in-one ELN, sample manager and Research Data Management (RDM) platform that integrates with many other data tools. RSpace is designed to act as a central data hub and pipeline for large academic institutes who want to support open science and FAIR data principles. RSpace already has good open APIs, but to encourage the data community to build even more integrations to allow better flow of data, RSpace is now fully open source. Learn more here: https://github.com/rspace-os

submitted by /u/invasifspecies
[link] [comments]

Buying Customer Parent-Child Relationship Data

We need to build parent/child relationships between customers in our system.

We have about 300,000 customers—many are mid-sized companies, a few are very large, and a number are very small outfits.

About 3% of our customers have “children,” meaning they own other customers in our database. We are unsure how many customers fall into that ‘child’ category, but we estimate it may be around 10% of the total customer population.

We have an enterprise MDM system connected to our CRM, which can help us manage the data.

Our challenge is finding a reliable source for parent-child relationship data. We are a large national company (USA) and we have a reasonable budget to purchase this data, but I am unsure where to start looking. We currently buy some data from D&B, but their parent-child values are unreliable at best, making it difficult to depend on them.

If anyone has suggestions on where we can obtain accurate parent-child relationship data, please share. It would be much appreciated.

submitted by /u/HuronChief
[link] [comments]

Dataset Of Clinical Indicators For Benign Stomach Tumors

Request:
I’m looking for a dataset that contains clinical indicators for various benign stomach tumors. The dataset should include one or more of the following tumor types: gastric polyps, gastrointestinal stromal tumors (GISTs), lipomas, leiomyomas, schwannomas, neurofibromas, pancreatic heterotopia, hemangiomas, lymphangiomas, glomus tumors, fibromas.

metric of what I’d like about:

age, gender,blood cell count, liver function, kidney function, blood lipids, tumor markers and pathological results

I’m not too picky about the format as long as the diagnosis is separate from the clinical indicators and the formatting is consistent.

Artificial datasets are okay, maybe even preferred, as long as they’re accurate.

submitted by /u/TodayAshamed6095
[link] [comments]

Datasets Of Planetary Positions Over The Last Fifty Years.

I am working on a statistical analysis of gravitational effects on small earthly objects. I have been able to determine some correlations that appear to exist relative to the Earth’s axial tilt toward and away from the sun throughout the years in question.

This seems to be supported by tidal effects recorded across the globe. However this does not account for all the deviations I am seeing in the rest of the data, and I would like to confirm or disprove these potential correlations.

Given the number of deviations it seems evident there are other interplanetary dynamics at play. With a bit of digging, I came across John Henry Nelson’s work for RCA on Radio Wave Propagation as influenced by solar storms and coronal mass ejections.

His work found correlations between planetary alignment, solar flares, and CMEs as they relate to radio wave propagation. The academic paper was insightful but lacked the data I would need to use in my work.

I know I could reasonably approximate these details, but most definitely would prefer to simply grab some existing data and get back to number crunching.

Any help would be appreciated. Cheers!

submitted by /u/1-Awesome-Human
[link] [comments]

Looking For Dataset Of Medical Billing Company That’s Doing Covid Billing Or People With Blue Cross Blue Shield Insurance Patients!

Hey everyone, hope I will get some resources/idea from here. I was looking for the dataset of medical belling company that’s doing Covid billing / people with blue cross blue shield insurance patients. I need name, address, number, and ID that starts with XOF for people who have blue cross blue shield insurance. is it possible or you have any idea please lmk!

submitted by /u/Nandhagopalakrishnan
[link] [comments]

Data On Number Of Congregations By U.S. State

Hello! I would really appreciate some help with finding the number of congregations or churches (over all religious establishments) by state. Doing different searches reveals websites that show percentage of population that are different religions and similar info but not how many “churches” there are. I am assuming there has to be some way to find this info since they need to be registered with the state and federal government for tax purposes.

I assume I am just not using the right keywords. If someone could help me learn what the right thing to search is that would be excellent. TIA!

submitted by /u/herosandwixh
[link] [comments]

Developed A Free Platform To Quickly Create Jsonl Datasets For Gpt Finetuning And Customize Llm Call Functions

While I was working on some other projects I created for myself a platform to quickly create jsonl datasets for gpt finetuning and customize llm call functions. I realized it’s quite useful so I might as well just publish the site just in case it could be useful to any of you guys. All the functionalities are client side so you can check easily that I am not trying to steal your datasets :- )

Of course completely free!

https://finetune-gpt.vercel.app/

submitted by /u/Pleasant_Syllabub591
[link] [comments]

Where To Find Sweden Carbon Tax Rate (carbon Price)? I Look At Carbonpricingdashboard.worldbank.org But Sweden Carbon Tax Data Is Empty

Hi everyone, as explained in the title I’m curently looking for Sweden carbon tax rate or Sweden carbon price data from 2010-2019. I already tried using this site, but Sweden carbon tax rate is empty. Tried from this reddit post aswell, but still doesn’t find it. Does anyone please help me, where to find this data? (and if any other, could please share other carbon tax rate data other than the world bank one)

Thank you for your help!

submitted by /u/ILoveRice444
[link] [comments]

What Is The Best Dataset/API For Maps Data?

Hello Data Gurus, I came here and I think I will get the best help

I am currently building an app that tells about streets. I need a large dataset that has information about every single street in the world (Description, length, Hotels, etc etc etc)

Is there any API (It’s fine if paid) you recommend for this purpose?

It doesn’t have to be about streets. just information about places in the whole globe

And thank you for reading my question!

submitted by /u/waelnassaf
[link] [comments]

Automatic Subject Image Cropping Solution

Been working on compiling large datasets of characters derived from anime screencaps for the purpose of training as LoRAs to be used with Stable Diffusion. I’ll typically be working with usually about 10,000 images (and up to 80,000 images in some cases) that I will need to manually crop to focus on the intended character. That said, I do use a simple cosine similarity program to remove near-duplicate images along with WD1.4 tagging to divide images into their own character-specific datasets based on appearance, but I may still have to manually crop upwards of 1,000 images. It’s not impossible, but by no means a valuable use of time when there’s likely a way to significantly reduce the menial work.

I’ve seen some solutions with FiftyOne, but I’ve got no idea how to utilize it myself – are there any publicly available solutions anyone can recommend?

submitted by /u/jnslater
[link] [comments]

Instrumental One-Shot Sample Dataset

Hi.

I am looking for/building a dataset comprised of instrumental one-shots (defined as “individual hits, stabs, or sound bites). If any of you have used digital audio workstations, there are usually some preview one-shot samples that come with the software.

For a personal project, I am looking to utilize this dataset for the artificial generation of similar one-shot samples. Particularly, percussion one-shots would be useful for both training and proof-of-concept.

I am fairly new to personally collecting data, and any tips or advice for finding valuable sources would be much appreciated. There are some good instrumental datasets on Kaggle, but nothing that fits the data I’m looking for. Furthermore, if anyone has the ability to share a dataset matching this description, I would be grateful. Cheers!

submitted by /u/Even_Contribution_32
[link] [comments]

Looking For ECommerces With A Specific Provider

If you were building a SaaS for ecommerce stores but you were only able to integrate with one ecommerce provider (WooCommerce) at the moment.

How would you go about finding ecommerces using that provider so you can reach out to them later?

Right now I’m:

Googling: “buy [CATEGORY] online in [REGION/COUNTRY]”. Entering the first 10-15 stores. Using Free StoreLeads extension to see which provider they use. Create my own database on Sheets one by one.

Any ideas? Can’t afford StoreLeads platform rn.

submitted by /u/fgd2398
[link] [comments]

Need A Relatively Large Student Teacher Dialogue Dataset

Hi, as the title suggests I need a dataset that has recorded the interactions between students and teachers in a learning environment. For context, I’m currently working on a project for a university to develop a custom assistant that interacts with students in a tutor-like way using OpenAI’s API, the data will be used for fine-tuning interactions. Thanks in advance.

submitted by /u/AGMcCarron
[link] [comments]

How To Legally Find Dataset Of Doctors’ Treatment Histories?

I am currently working on a project which would require having the history of conditions treated by a doctor (easy to quantify) or their qualifications / research contributions (hard to quantify, pain to work with). I looked into things like OpenMRS and EMRBots but am pretty sure that they are simulated.

Where could I find a giant repository of these types of real but anonymised “health records” without committing a crime?

submitted by /u/Ok-Program-3656
[link] [comments]

Help Download The ASVSpoof2017 Dataset

So I live in South East Asia (I am assuming this is the root of the problem) and downloading from the Edinburgh DataShare website is nearly impossible. In my case, there was once where I was able to reach 850mb out of 1000mb on the evaluation dataset. Unfortunately, my internet died for a second and when i resumed the download, it resets back from the beginning. Yes 1000mb is small but the download speed is in kbps. I tried downloading it for the entire day and now it failed.

Here’s the link https://datashare.ed.ac.uk/handle/10283/3055

So i want to know whether someone has a mirror link for it or a way i can download it faster. That’s all from me. Thanks.

Oh and also do tell me if you think that i need to go through a formal procedure for it. I did ask about this to the informarion services of the university of Edinburgh but have yet to get a reply. Once again, thank you.

submitted by /u/Puzzleheaded-Path306
[link] [comments]

259k – LLM Unity3d API Dataset For Fine-Tuning

Hey everyone just uploaded a 259k dataset for unity. It’s not a coding dataset but rather a dataset to teach the model about unity’s API properties. It took me 3 days to create with 6 instances of llama3 8B exl2. I have trained a model on the dataset and works very well. It does cause the model to hallucinate so you might have to play with the fine tuning hyper parameters and possibly align the model after. Enjoy

submitted by /u/Delicious-Farmer-234
[link] [comments]