Need Help Finding Two Datasets Around 5k And 20k Entries To Train A Model (classification ). I Needed To Pass A Project Help Pls

Hi I need these two datasets for a project but I’ve been having a hard time finding so many entries, and not only that but finding two completely different datasets so I can merge them together.

Do any of you know of some datasets I can use (could be famous ) ? I am studying computer science so I am not really that experienced on the manipulation of data.

They have to be two different datasets I can merge to have a more wide look and take conclusions. In adittion I need to train a classification type model

I would be very grateful

submitted by /u/Jproxy122
[link] [comments]

0

Trying To Build A Dataset Of Political Donations By Industry, Need Some Help Starting.

I’m working on a little passion project, a dataset of political donations in Alaska that would be broken down by company, industry, donor location, and candidate.

But campaign finance filings are very scattered and inconsistent. Some candidates over the years have reported via PDFs, others dump spreadsheets, and a few towns barely publish anything. I had more luck with the statewide Akorgs company register, which is good for data on who actually owns what, but it’s a small part of this “research”.

I’ve also looked through municipality and state election sites manually, but I’m missing smaller local races or entities that don’t get flagged properly (especially Native corporations or smaller PACs). Ideally, I want a clean CSV or database where I can filter donors by SIC code or address.

So, if anyone knows a (maybe free) consolidated repository by state, even just for some years, I’d appreciate it. Any other data sources or tools for this, including third-party aggregators, is also welcome.

submitted by /u/Sharp-Self-Image
[link] [comments]

0

Building A Data Stack For High Volume Datasets

Hi all,

We as a product analytics company, and another customer data infrastructure company wrote an article about how to build a composable data stack. I will not write down the names, but I will insert the blog in the comments if you are interested.

If you have comments feel free to write. Thank you, I hope we could help

submitted by /u/Still-Butterfly-3669
[link] [comments]

0

Dataset: Material Deformation Data For A New Phase-changing Polymer

This dataset comes from an early-stage lab experiment examing deformation behavior in a novel phase-changing polymer under varying loads.

Dropbox Link: https://www.dropbox.com/scl/fo/2by3a3cvyimg5zp1fyabf/ABG0s8meLsN2LQvkwqgCQnU?rlkey=9iub6b8oufwf1fbogayh662n2&st=winsvzjk&dl=0

Columns in CSV includes:

`sample_id`
`strain_rate`
`temperature`
`load`
`elonation`
`phase_transition`

This dataset is free to use for research and educational purposes.

submitted by /u/Automatic_Program114
[link] [comments]

0

Alternate Sources For US Government Data | “[B]acked-up, Large Projects And Public Archives That Serve As Alternatives To Federal Data Sources, And Subscription-based Library Databases. Visit These Sources In The Event That Federal Data Becomes Unavailable.”

submitted by /u/johnabbe
[link] [comments]

0

Advice For Creating A Crop Disease Prediction Dataset

i have seen different datasets from kaggle but they seem to be on similar lightning, high res, which may result in low accuracy of my project
so i have planned to create a proper dataset talking with help of experts
any suggestions?? how can i improve this?? or are there any available datasets that i havent explored

submitted by /u/Cyrus_error
[link] [comments]

0

Dataset Required For Quantitative Behavioural Analysis On Sustainability Behaviours

Hi all,

I’m working on a project that involves analyzing sustainability-related behaviors (e.g. energy use, recycling, green consumption, sustainable transport, etc.) using quantitative data.

These could include:

Household or individual-level data on energy, water, or transport usage
Panel data on product or brand choices, especially eco-labeled or green products
Surveys with attitudinal + behavioral questions
Pre/post intervention data (even better if from sustainability campaigns)
Consumer or municipal-level data on waste, electricity, or mobility

The project is for my portfolio and non-commercial, and I’m happy to share back any insights or modeling techniques with those interested. Any pointers to open datasets, research repositories, or organizations sharing such data would be hugely appreciated.

Thanks in advance!

submitted by /u/sarthook
[link] [comments]

0

In Need Of Finding A Dataset With DSA Questions With Answers (mcq/fill In The Blanks)

submitted by /u/Loud-Dream-975
[link] [comments]

0

[CSV] US Plastic‑Surgery Cost & Surgeon‑Availability — 600 Rows (100 Metros × 6 Procedures, July 2025)

**TL;DR – data updated 2025‑07‑04**

> *Example:* In **Phoenix** a **rhinoplasty** averages **$10 250** (range $7 k–$14 k) with **38** board‑certified plastic surgeons; next consult ≈ 14 days.

**Raw CSV (70 kB, no signup):**

https://raw.githubusercontent.com/Pastor0fMuppets/plastic-surgery-info/v2507/data/plastic_cost_v2507.csv

—-

### What’s inside?

| Column | Notes |

|——–|——-|

| `City` | Top 100 U.S. metros |

| `Procedure` | Rhinoplasty, Breast Augmentation, Liposuction, Tummy Tuck, Facelift, Breast Reduction |

| `Avg_Cost_USD` | RealSelf “Worth‑It” averages (rounded) |

| `Cost_Range_USD` | 25th–75th percentile |

| `Board_Cert_Surgeons` | Count of individual NPIs with plastic‑surgery taxonomy (`2082*`) |

| `Earliest_Consult_Days` | Days until next open slot (from AestheticMatch feed) |

| `Financing?` | Yes / No flag (CareCredit / Alpheon accepted) |

| `Consult_Link` | Branded redirect to booking form **inside the CSV rows only** |

### Data sources

* RealSelf Cost API (CC BY 4.0) – scraped 2025‑07‑03

* CMS NPPES (2025‑06 dump) – public domain

* AestheticMatch availability feed

### Disclaimer

Prices are averages for information only and may vary.

Not medical advice. Verify costs and credentials with a board‑certified surgeon.

submitted by /u/Haunting_Photo_9361
[link] [comments]

0

[self-promotion] Me And A Friend Are Building A Node-based Online Data Processing/app Building Tool, Interested In Any Feedback Or Thoughts

The link is to an example application we built using public data sets found online. TailrMade itself is based a bit on Unreal Engine’s blueprint and other things we like.

Also here is the default landing page:
https://tailrmade.app/?loadGraph=publicUser;;Welcome%20to%20Tailrmade;;Default

submitted by /u/fudgem
[link] [comments]

0

Sharing My Upwork Job Scraper Using Their Internal API

Just wanted to share a project I built a few years ago to scrape job listings from Upwork. I originally wrote it ~3 years ago but updated it last year. However, as of today, it’s still working so I thought it might be useful to some of you.

GitHub Repo: https://github.com/hashiromer/Upwork-Jobs-scraper-

submitted by /u/GullibleEngineer4
[link] [comments]

0

Dataset For TNVR/CNVR Initiatives For Strays

I’m getting tired of my city’s vet office and the idiot who’s running the page.

That being said, I’d to try and give him a dashboard because this guy is asking ‘actual numbers’ on effects TNVR.

Does anybody know where I can get a hold of this data? Preferably 2010 onwards.

submitted by /u/ryangosling-san
[link] [comments]

0

Datasets For Cognitive Biases Impact

Bit of an odd request, I want a dataset where I want to illustrate in Power Bi tool the impact of behavioral analytics and want to display the impact for it.

Any idea where I can find? I am open to any industry but D2C industries would be preferrable i guess.

submitted by /u/skap24
[link] [comments]

0

Alternatives To The X API For A Student Project?

Hi community,

I’m a student working on my undergraduate thesis, which involves mapping the narrative discourses on the environmental crisis on X. To do this, I need to scrape public tweets containing keywords like “climate change” and “deforestation” for subsequent content analysis.

My biggest challenge is the new API limitations, which have made access very expensive and restrictive for academic projects without funding.

So, I’m asking for your help: does anyone know of a viable way to collect this data nowadays? I’m looking for:

Python code or libraries that can still effectively extract public tweets.
Web scraping tools or third-party platforms (preferably free) that can work around the API limitations.
Any strategy or workaround that would allow access to this data for research purposes.

Any tip, tutorial link, or tool name would be a huge help. Thank you so much!

TL;DR: Student with zero budget needs to scrape X for a thesis. Since the API is off-limits, what are the current best methods or tools to get public tweet data?

submitted by /u/letucas
[link] [comments]

0

Looking For A Reliable Source Of Player Tackles Odds — Any Leads?

Hey folks, We’re working on a prop-focused betting analytics tool, and we’ve run into a wall trying to consistently source player tackles odds across major leagues (especially Premier League, La Liga, MLS, etc.).

We’re NOT looking for final match stats (we already have those), and we’re not scraping bookies directly due to all the anti-bot measures.

What we’re looking for:

A data provider/API that reliably includes pre-match odds for player tackles

Ideally with some sort of subscription or monthly fee (we want stability, not hacks)

Doesn’t have to be Opta-tier, just accurate and consistent

We’re happy to pay if it saves us the headache and keeps things running clean on the backend. If anyone’s using or knows of a source (public or private), I’d love to hear from you.

Thanks in advance for any help — and if anyone’s building something similar, always open to connect!

submitted by /u/hildegrim17
[link] [comments]

0

Request: Reddit Posts And Comments From R/endometriosis (April–May 2025) For Academic Research

Hello! I am conducting academic research on discussions in r/endometriosis from April through May 2025 and January 2023. I’m looking for datasets containing posts and comments from that subreddit during this period. I’ve tried Reddit API and Pushshift but haven’t been able to access the full historical data. If anyone has such a dataset or can point me to where I can find it, I’d really appreciate your help! Thanks so much!

submitted by /u/LordofRinger
[link] [comments]

0

Is There A Free Unlimited API For Flight Pricing

As the title said I want free or maybe paid with free trial API to extract flight prices

submitted by /u/Sunday_A
[link] [comments]

0

Can Anyone Suggest Real Time Dataser Related To Signal Processing ?

I am planning to do research project related to Machine Learning in the field of signal processing.
My interest lies in GNN , Optimization , and Quantum Machine Learning.
If anyone wants to collaborate for the project , you can DM me .

submitted by /u/IllustriousPie7068
[link] [comments]

0

Feature-Engineered Mouse Dynamics Dataset For Anomaly Detection

submitted by /u/Flash_00007
[link] [comments]

0

Best Pharmacy, Grocery Store, Retail Store, Etc Databases

Hi everyone,

I’m new to this kind of stuff. I’ve been struggling to find databases that will give me point data on pharmacies, grocery stores, retail stores, etc, for a project of mine. I have tried OMS but I am looking for Vermont data and OMS has very bad coverage of rural areas, Google Maps results are way more plentiful. Anyone have recommendations?

Thanks

submitted by /u/BattalionX
[link] [comments]

0

Has Anyone Used Images + Description From Art Resource(website) Before?

Hi, as the title says, has anyone accessed data from Art Resource (https://www.artres.com/) before?

I just wanted to know if you access both the images and the description? And if you can get it for free if possible?

Thanks!

submitted by /u/hyyhfvr
[link] [comments]

0

A Single Easy-to-use JSON File Of The Tanakh/Hebrew Bible In Hebrew

Hi I’m making a Bible app myself and I noticed there’s a lack of clean easy-to-use Tanakh data in Hebrew (with Nikkud). For anyone building their Bible app and for myself, I quickly put this little repo together and I hope it helps you in your project. It has an MIT license. Feel free to ask any questions.

submitted by /u/Rikartt
[link] [comments]

0

Opendatahive Want F### Scale AI And Kaggle

OpenDataHive look like– a web-based, open-source platform designed as an infinite honeycomb grid where each “hexo” cell links to an open dataset (API, CSV, repositories, public DBs, etc.).

The twist? It’s made for AI agents and bots to explore autonomously, though human users can navigate it too. The interface is fast, lightweight, and structured for machine-friendly data access.

Here’s the launch tweet if you’re curious: https://x.com/opendatahive/status/1936417009647923207

submitted by /u/Ok-Cut-3256
[link] [comments]

0

Opendatahive Want Fuck Scale AI And Kaggle

OpenDataHive look like– a web-based, open-source platform designed as an infinite honeycomb grid where each “hexo” cell links to an open dataset (API, CSV, repositories, public DBs, etc.).

The twist? It’s made for AI agents and bots to explore autonomously, though human users can navigate it too. The interface is fast, lightweight, and structured for machine-friendly data access.

Here’s the launch tweet if you’re curious: https://x.com/opendatahive/status/1936417009647923207

submitted by /u/Ok-Cut-3256
[link] [comments]

0

Is The UCI Machine Learning Repository Down?

I can’t access it.

submitted by /u/Last_Clothes6848
[link] [comments]

0

Formats For Datasets With Accompanying Code Deserializers

Hi: I work in academic publishing and as such have spent a fair bit of time examining open-access datasets as well as various standardizations and conventions for packaging data into “bundles”. On some occasions I’ve used datasets for my own research. I’ve consistently found “reusability” to be a hindrance, even though it’s one of the FAIR principles. In particular, it seems very often necessary to write custom code in order to make any productive use of published data.

Scientists and researchers seem to be of the impression that because formats like CSV and JSON are generic and widely-supported, data encoded in these formats is automatically reusable. However, that’s rarely true. CSV files often do not have a one-to-one correlation between columns and parameters/fields, so it’s sometimes necessary to group multiple columns, or to further parse individual columns (e.g., mapping strings governed by a controlled vocabulary to enumeration values). Similarly, JSON (and XML) requires traversers that actually walk through objects/arrays and DOM elements, respectively.

In principle, those who publish data should likewise publish code to perform these kinds of operations, but I’ve observed that this rarely happens. Moreover, this issue does not seem particularly well addressed by popular standards like Research Objects or Linked Open Data. I believe there should be a sort of addendum to RO or FAIR saying something like this:

For a typical dataset, (1) it should be possible to deserialize all of the contents, or a portion thereof (according to users’ interests) into a collection of values/objects in some programming language, and (2) data publishers should make deserialization code directly available as part of the contents, or at least direct users to open-source code libraries with such capabilities.

The question I have, against that background, is — are there existing standards addressing things like deserialization which have some widespread recognition (at least comparable to FAIR or to Research Object Bundles)? Also, is there a conventional terminology for relevant operations/requirements in this context? For example, is there any equivalent to “Object-Relational Mapping” (to mean roughly “Object-Dataset Mapping”)? Or a framework to think through the interoperation between code libraries and RDF ontologies? In particular, is there any conventional adjective to describe data sets that have deserialization capabilities relevant to my (1) and (2)?

Once, I published a paper talking about “procedural ontologies” which had to do with translating RDF elements to code “objects”, wherein they had functionality and properties described by their public class interface. We then have the issue of connecting such attributes with those modeled by RDF itself. I though the expression “Procedural Ontology” was a useful term, but I did not find (then or later) a common expression that had a similar meaning. Ditto for something like “Procedural Dataset”. So this either means there’s blind spots in my domain knowledge (which often happens) or that these issues actually are under-explored in the realm of data publishing.

Apart from merely providing deserialization code, datasets adhering to this concept rigorously might adopt policies such as annotating types and methods to establish correlations with data files (e.g., a particular CSV column, or XML attribute, say, is marked as mapping to a particular getter/setter pair in some class of a code library) and to describe the relevant code in metadata (things like programming language, external dependencies, compiler/language versions, etc.). Again, I’m not aware of conventions in e.g. Reseach Objects for describing these properties of accompanying code libraries.

submitted by /u/osrworkshops
[link] [comments]

0

Ways To Practice Introductory Data Analysis For The Social Sciences

submitted by /u/Creative-Level-3305
[link] [comments]

0

Looking For Statistics Re: US Sodomy Law Enforcement

Xposting across r/AskGayMen, r/AskGaybrosOver40, r/AskHistorians, r/datasets, r/law, and r/PoliceData.

I’m looking for actual statistics, cases, and documented examples of enforcement of sodomy laws in the United States. Particularly in relation to homosexuality. Does anyone know where I can find these data?

submitted by /u/ACleverRedditorName
[link] [comments]

0

Looking For A Dataset On Sales And Or Tech Support Calls.

Does a dataset like this exist publicly? Ideally this set would include audio.

submitted by /u/Kainkelly2887
[link] [comments]

0

Looking For Roadworks/construction APIs Or Open Data Sources For Cycling Route Planning App

Hey everyone!

I’m building an open-source web app that analyzes cycling routes from GPX files and identifies roadworks/construction zones along the path. The goal is to help cyclists avoid unexpected road closures and get suggested detours for a smoother ride.

Currently, I have integrated APIs for: – Belgium: GIPOD (Flanders region) – Netherlands: NDW (National road network) – France: Bison Futé + Paris OpenData – UK: StreetManager

I’m looking for similar APIs or open data sources for other countries/regions, particularly: – Germany, Austria, Switzerland (popular cycling destinations) – Spain, Portugal, Italy – Denmark, Sweden, Norway – Any other countries with cycling-friendly open data

What I need: – APIs that provide roadworks/construction data with geographic coordinates – Preferably with date ranges (start/end dates for construction) – Polygon/boundary data is ideal, but point data works too – Free/open access (this is a non-commercial project)

Secondary option: I’m also considering OpenStreetMap (OSM) as a supplementary data source using the Overpass API to query highway=construction and temporary:access tags, but OSM has limitations for real-time roadworks (updates can be slow, community-dependent, and OSM recommends only tagging construction lasting 6+ months). So while OSM could help fill gaps, government/official APIs are still preferred for accurate, up-to-date roadworks data.

Any leads on government open data portals, transportation department APIs, or even unofficial data sources would be hugely appreciated! 🚴‍♂️