Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Data Set Request: CPU Specifications

Hi everyone,

I’m looking for a good CPU dataset – I’m mostly interested in base clock, turbo clock (broken down to single and multi core if possible) & TDP but the more info the better

I read that the Intel ARK database has a CSV that can be dumped from the Android app, but not managed to find a good source for AMD CPUs yet

submitted by /u/aw1cks
[link] [comments]

Data Set Request: Renewable Energy Projects In India

Hey everyone,

I’m looking for detailed datasets on several aspects of renewable energy projects in India:

Investment data in renewable energy projects, especially focusing on foreign investments. Broader economic impacts of these foreign investments on India’s economy. The effects of subsidies on renewable energy revenue.

Any pointers or links to these datasets would be greatly appreciated!

submitted by /u/Interesting_Cause826
[link] [comments]

Where To Find US Trademark Data (have Lookup Database, But Not Sure How To Get Aggregate Data Out)

Hi, I’m looking for granular US trademark data that includes the name of the company that filed the trademark (I’m trying to view summary statistics on trademark filed by company in the US).

I’ve tried: https://tmsearch.uspto.gov/search/search-resultsbut but can’t find out how to get the aggregate data out of this.

I’ve been told that this data should be publicly available, but am stumped on where to find it.

Does anyone have a data set that would have this data? Alternatively, does anyone know how to scrape data from this lookup above?

submitted by /u/Acrobatic_Stay_9221
[link] [comments]

Help Understanding A Dataset About Pancreatic Ductal Adenocarcinoma!

Hi everyone! I’m trying to understand this dataset: (https://www.cancerimagingarchive.net/collection/cptac-pda/) –> it says that the patients have pancreatic ductal adenocarcinoma, but once I downloaded the dataset, it is a complete MESS. It is not organized at all, there are not annotations, the DICOM files don’t make sense, and all the files say NA (which I’m assuming means negative assessment). I don’t have enough time to sit and try to reconstruct/annotate all these DICOM files and it’s honestly just not making sense to me. If anyone has any experience or understand what is going in this dataset it would be greatly appreciated! Thank you so much!

submitted by /u/DiyaRamakrishnan
[link] [comments]

I’m Seeking A Heart Disease Dataset For Training A Model

I’ve been trying to find a dataset of Cornary Artery disease patients, or any Cardiovascular disease that contains a few biomarkers info in the columns. I tried searching quite a lot of sites like kaggle,physionet etc and Its either unavailable or is locked behind a paywall (Im a research student). Is there any free medical datasets around that I can dig in? I’d be so grateful for your time help

submitted by /u/Linus_sex_tipz
[link] [comments]

I Can Not Figure Out How To Use Refinitiv With Python

I have a really heavy excel file 400K rows, and I need, using cusip or ticker and date of shareholder meeting (variables for each row) to download data such as revenues, total asset, market cap, etc. for my thesis. I tried excel =TR(…) formula, but does not recognise cusip or ticker, while datastream recognise the ticker or u/ticker but when I run the formula for all rows my file stop to work, and excel crash. Therefore I tried using chatgpt trying to use python but it seems there is not a way to use datastream through python or at least I don’t have accademic API key. I have eikon and refinitiv API key. I tried using eikon but even through python can’t find the data, showing N/A or ticker not found, even if looking for the ticker within desktop app, the company can be found. If a value is found is different than the one downloaded in the few rows where datastream in excel worked. I don’t know how to populate my heavy database without crash.
Python provide several errors or no data (N/A)
What do you suggest?
First example
https://ibb.co/R47Cp56

https://ibb.co/QfXckzT

https://ibb.co/vH7F5ZM

https://ibb.co/S5rKyqn

submitted by /u/Low_Seat419
[link] [comments]

Centrality Measures For Co-authorship And Country Collaboration

hi guys i am new to SNA and using R. actually im pretty new to relearch and data analysis in general. I have been trying to figure out the centrality measures for the data i am uploading, specifically the countries and authors. I want to see which countries and authors are playing the central roles in publishing on this particular topic. I have tried using R to do this bc again, im very new to data analysis. I just dont know how to make an edge list and which packages to use. It’s not like I havent tried, i have spent hours trying to but am just getting frustrated. any help would be appreciated! tysm!

also: when i upload this doc vosviewer and biblioshiny, the graphs look different? why is that? which clustering algorithm would you guys recommend?

https://docs.google.com/spreadsheets/d/1iiXfVfuKiOkHwZ2W7Hw4SoY7m2g54iy4pvJtDdeXivI/edit?gid=1561254436#gid=1561254436

submitted by /u/StrongVeterinarian33
[link] [comments]

Is This The Right Place To Ask For Ideas On What To Do With The Data I’m Collecting?

As a hobby, two developer frends and I built a project about collecting data about Chicago’s live music industry and showcasing it in a useful way.

RN we have a map of events happening this weel, filtered by day, and a landing page displaying just the list of events.

We’re collegting the events data, venue fata, and artist’s data.

What else could we do with it?

The site is chicagomusiccompass.com

submitted by /u/sateliteconstelation
[link] [comments]

Weather Station Location To Zip Code Cross Reference

I’m trying to map zipcodes to their closest weather station (see example station code and name below) but am having trouble finding a source. I’ve been scouring the NOAA website which offers some maps to let you look up one zip code at a time but I can’t locate any sort of tables or similar user-friendly data. The NOAA reports that contain these stations also have latitude and longitude fields but matching to a zipcode on that basis seems pretty tricky. Does anyone know of a data source or have suggestions?

|| || |USW00023230|OAKLAND INTERNATIONAL AIRPORT, CA USUSW00023230|

submitted by /u/dalberts
[link] [comments]

Looking For Simple General Questions Dataset

Heya,

i’m working on a little project of mine that i’d like to infuse with some actual life, now the issue is that for that to work my idea was to generate synthetic conversations, the issue is that i realized that i can’t seem to find a good dataset that is specifically including questions to “learn more about someone” most of them are general usecase about helping the user which are good! But common “chat” questions like “what is your favourite meal?” “Do you listen to rock music?” are usually NOT included.

Now i’m here, in the depths of reddit asking for some clues and if someone might know such datasets as huggingface seems to have none of them.

Thanks in advance!

submitted by /u/LocalBratEnthusiast
[link] [comments]

Looking For A Big Data Set For SQL Server

Hi guys I’m looking for a big data set for SQL Server with at least 10 tables and 40k rows in each. I already looked into the sample databases that Microsoft provides on their site (AdventureWorks, Northwind, Chinook…). I am looking for something simple but big enough to later on make a dimensional model.

submitted by /u/Macandcheeseilf
[link] [comments]

Data Wrangling Woes: My Experience Working With A Data Analyst

Hey everyone! So, I’m not a data analyst myself, but recently I had the chance to work on a project with a fantastic one. Let’s just say, it opened my eyes to the whole world of data training and modeling, and the crazy challenges they face!

These analysts are basically data wranglers, trying to tame messy datasets and turn them into something useful for the company. They build these models that help us make better decisions, but it seems like there’s a constant battle to find the right data and train the models efficiently.

One thing that really stuck with me was this whole concept of data training. Apparently, it’s all about having high-quality data to feed these algorithms. Everyone’s talking about this new GPT-4 language model, supposedly a game-changer for things like text analysis. But the analyst I worked with mentioned it’s still not magic – even the fanciest AI needs good data to train on.

Look, I may not be a data whiz, but I’m curious to learn more! What are some of the biggest hurdles you analysts face with data training and modeling? Have any of you tried using GPT-4 or similar AI tools?

Let’s turn this into a conversation! Share your experiences, ask questions, and maybe us non-data folks can learn a thing or two from the data wranglers out there.

submitted by /u/xyridfosterlingu9
[link] [comments]

Request For Cleaned English Slang Definitions Dataset

Anybody seen a cleaned slang dataset? Urban dictionary has one with 2.5 million definitions, but the definitions are terrible. I’d rather a much smaller dataset (<30k slang words) but that is 95%+ correct.
I don’t even necessarily need the definitions. I can make do with just the 30k most common slang words/phrases in the english language

submitted by /u/TerrificMist
[link] [comments]