Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Datasets Of Planetary Positions Over The Last Fifty Years.

I am working on a statistical analysis of gravitational effects on small earthly objects. I have been able to determine some correlations that appear to exist relative to the Earth’s axial tilt toward and away from the sun throughout the years in question.

This seems to be supported by tidal effects recorded across the globe. However this does not account for all the deviations I am seeing in the rest of the data, and I would like to confirm or disprove these potential correlations.

Given the number of deviations it seems evident there are other interplanetary dynamics at play. With a bit of digging, I came across John Henry Nelson’s work for RCA on Radio Wave Propagation as influenced by solar storms and coronal mass ejections.

His work found correlations between planetary alignment, solar flares, and CMEs as they relate to radio wave propagation. The academic paper was insightful but lacked the data I would need to use in my work.

I know I could reasonably approximate these details, but most definitely would prefer to simply grab some existing data and get back to number crunching.

Any help would be appreciated. Cheers!

submitted by /u/1-Awesome-Human
[link] [comments]

Looking For Dataset Of Medical Billing Company That’s Doing Covid Billing Or People With Blue Cross Blue Shield Insurance Patients!

Hey everyone, hope I will get some resources/idea from here. I was looking for the dataset of medical belling company that’s doing Covid billing / people with blue cross blue shield insurance patients. I need name, address, number, and ID that starts with XOF for people who have blue cross blue shield insurance. is it possible or you have any idea please lmk!

submitted by /u/Nandhagopalakrishnan
[link] [comments]

Data On Number Of Congregations By U.S. State

Hello! I would really appreciate some help with finding the number of congregations or churches (over all religious establishments) by state. Doing different searches reveals websites that show percentage of population that are different religions and similar info but not how many “churches” there are. I am assuming there has to be some way to find this info since they need to be registered with the state and federal government for tax purposes.

I assume I am just not using the right keywords. If someone could help me learn what the right thing to search is that would be excellent. TIA!

submitted by /u/herosandwixh
[link] [comments]

Developed A Free Platform To Quickly Create Jsonl Datasets For Gpt Finetuning And Customize Llm Call Functions

While I was working on some other projects I created for myself a platform to quickly create jsonl datasets for gpt finetuning and customize llm call functions. I realized it’s quite useful so I might as well just publish the site just in case it could be useful to any of you guys. All the functionalities are client side so you can check easily that I am not trying to steal your datasets :- )

Of course completely free!

https://finetune-gpt.vercel.app/

submitted by /u/Pleasant_Syllabub591
[link] [comments]

Where To Find Sweden Carbon Tax Rate (carbon Price)? I Look At Carbonpricingdashboard.worldbank.org But Sweden Carbon Tax Data Is Empty

Hi everyone, as explained in the title I’m curently looking for Sweden carbon tax rate or Sweden carbon price data from 2010-2019. I already tried using this site, but Sweden carbon tax rate is empty. Tried from this reddit post aswell, but still doesn’t find it. Does anyone please help me, where to find this data? (and if any other, could please share other carbon tax rate data other than the world bank one)

Thank you for your help!

submitted by /u/ILoveRice444
[link] [comments]

What Is The Best Dataset/API For Maps Data?

Hello Data Gurus, I came here and I think I will get the best help

I am currently building an app that tells about streets. I need a large dataset that has information about every single street in the world (Description, length, Hotels, etc etc etc)

Is there any API (It’s fine if paid) you recommend for this purpose?

It doesn’t have to be about streets. just information about places in the whole globe

And thank you for reading my question!

submitted by /u/waelnassaf
[link] [comments]

Automatic Subject Image Cropping Solution

Been working on compiling large datasets of characters derived from anime screencaps for the purpose of training as LoRAs to be used with Stable Diffusion. I’ll typically be working with usually about 10,000 images (and up to 80,000 images in some cases) that I will need to manually crop to focus on the intended character. That said, I do use a simple cosine similarity program to remove near-duplicate images along with WD1.4 tagging to divide images into their own character-specific datasets based on appearance, but I may still have to manually crop upwards of 1,000 images. It’s not impossible, but by no means a valuable use of time when there’s likely a way to significantly reduce the menial work.

I’ve seen some solutions with FiftyOne, but I’ve got no idea how to utilize it myself – are there any publicly available solutions anyone can recommend?

submitted by /u/jnslater
[link] [comments]

Instrumental One-Shot Sample Dataset

Hi.

I am looking for/building a dataset comprised of instrumental one-shots (defined as “individual hits, stabs, or sound bites). If any of you have used digital audio workstations, there are usually some preview one-shot samples that come with the software.

For a personal project, I am looking to utilize this dataset for the artificial generation of similar one-shot samples. Particularly, percussion one-shots would be useful for both training and proof-of-concept.

I am fairly new to personally collecting data, and any tips or advice for finding valuable sources would be much appreciated. There are some good instrumental datasets on Kaggle, but nothing that fits the data I’m looking for. Furthermore, if anyone has the ability to share a dataset matching this description, I would be grateful. Cheers!

submitted by /u/Even_Contribution_32
[link] [comments]

Looking For ECommerces With A Specific Provider

If you were building a SaaS for ecommerce stores but you were only able to integrate with one ecommerce provider (WooCommerce) at the moment.

How would you go about finding ecommerces using that provider so you can reach out to them later?

Right now I’m:

Googling: “buy [CATEGORY] online in [REGION/COUNTRY]”. Entering the first 10-15 stores. Using Free StoreLeads extension to see which provider they use. Create my own database on Sheets one by one.

Any ideas? Can’t afford StoreLeads platform rn.

submitted by /u/fgd2398
[link] [comments]

Need A Relatively Large Student Teacher Dialogue Dataset

Hi, as the title suggests I need a dataset that has recorded the interactions between students and teachers in a learning environment. For context, I’m currently working on a project for a university to develop a custom assistant that interacts with students in a tutor-like way using OpenAI’s API, the data will be used for fine-tuning interactions. Thanks in advance.

submitted by /u/AGMcCarron
[link] [comments]

How To Legally Find Dataset Of Doctors’ Treatment Histories?

I am currently working on a project which would require having the history of conditions treated by a doctor (easy to quantify) or their qualifications / research contributions (hard to quantify, pain to work with). I looked into things like OpenMRS and EMRBots but am pretty sure that they are simulated.

Where could I find a giant repository of these types of real but anonymised “health records” without committing a crime?

submitted by /u/Ok-Program-3656
[link] [comments]

Help Download The ASVSpoof2017 Dataset

So I live in South East Asia (I am assuming this is the root of the problem) and downloading from the Edinburgh DataShare website is nearly impossible. In my case, there was once where I was able to reach 850mb out of 1000mb on the evaluation dataset. Unfortunately, my internet died for a second and when i resumed the download, it resets back from the beginning. Yes 1000mb is small but the download speed is in kbps. I tried downloading it for the entire day and now it failed.

Here’s the link https://datashare.ed.ac.uk/handle/10283/3055

So i want to know whether someone has a mirror link for it or a way i can download it faster. That’s all from me. Thanks.

Oh and also do tell me if you think that i need to go through a formal procedure for it. I did ask about this to the informarion services of the university of Edinburgh but have yet to get a reply. Once again, thank you.

submitted by /u/Puzzleheaded-Path306
[link] [comments]

259k – LLM Unity3d API Dataset For Fine-Tuning

Hey everyone just uploaded a 259k dataset for unity. It’s not a coding dataset but rather a dataset to teach the model about unity’s API properties. It took me 3 days to create with 6 instances of llama3 8B exl2. I have trained a model on the dataset and works very well. It does cause the model to hallucinate so you might have to play with the fine tuning hyper parameters and possibly align the model after. Enjoy

submitted by /u/Delicious-Farmer-234
[link] [comments]

I Would Like To Know About Commercials/advertisements On Cable News Prime Time Hours (7p-12a).

One of the things over the last several months that occurred to me was the sheer volume and type of commercials aired during cable news programs between segments. I’d like to know the odds of 1) landing on a commercial/ad, and as a bonus, 2) the odds of that commercial/ad being one of healthcare relevance (prescription meds, supplements, insurance/medicare/medicaid, etc., things targeted at seniors, for the most part).

submitted by /u/johnnybiggles
[link] [comments]

Ensuring Accurate School Account Setup: Resolving Missing Unique Reference Numbers And Preventing Future Errors

Can someone answer this for me, I’m currently learning how to best resolve issues in data-setups: “You have been working through account set up on the platform for a number of schools. After completing the set up for the majority of schools you realise you have forgotten to manually enter each school’s unique reference number. This creates the risk that some school users may have been assigned to the wrong school. What steps would you take to resolve this issue, and ensure it didn’t happen again?”

submitted by /u/OrderOnly8503
[link] [comments]

Language Lists – Blacklisted Words, Male & Female First Names, Common Surnames, & More

List of Vulgarity – each word / term is separated by a newline.

List of First Names – CSV file with fields name, gender, probability where gender is represented with either M or F with respective probability for gender accuracy.

List of Surnames – CSV file with the following fields:

name – surname / last name rank – national rank based on commonality count – number of people with the last name prop100k – proportion per 100,000 population for name cum_prop100k – same as above except cumulative proportion pctwhite – percent white pctblack – percent black or african american pctapi – percent asian, native hawaiian, and pacific islander. pctaian – percent american indian and Alaska native pct2prace – percent mix of two or more races pcthispanic – percent hispanic or latino

submitted by /u/JTrexler
[link] [comments]