Where Can I Find A Db Of Exercise Questions For Learning A Language

Hi, I am building language learning app for my younger brother. He is currently learning Spanish. I want to make an app/website where he practice questions for grammar/vocab etc. can anyone point me to any dataset that already exists? Is there any dataset perhaps of Duolingo exercises somewhere on the internet?

submitted by /u/hyumaNN
[link] [comments]

0

Looking For Data On College Students’ Four Year College Major And Grades

Hi everyone! I am interested in researching education economics, particularly in how students choose their majors in college. Where can I find publicly available or purchasable data that includes student-level information, such as major choice, GPA, college performance, as well as graduate wages and job outcomes?

submitted by /u/misakkka
[link] [comments]

0

Is There A Dataset Of All Public Subreddits On Reddit With Their Description?

Title, Looking for a way to obtain the list of all public subreddits. If there is an API which provides this data, I can use it as well or use some webscraping if needed but I can’t find a resource.

submitted by /u/GullibleEngineer4
[link] [comments]

0

I Built A Federal/state Income Tax API [self-promotion]

Hey y’all,

It’s April, so you know what that means: tax season!

I just built an API to compute a US taxpayer’s income tax liability, given income, filing status, and number of dependents. To ensure the highest accuracy, I manually went through all the tax forms (yep, including all 50 states!).

I’d love for you to try it out, and get some feedback. Maybe you can use it to build a tax calculator, or create some cool visualizations?

You can try it for free on RapidAPI.

submitted by /u/thisisfine218
[link] [comments]

0

Need IPL Dataset Over By Over . Need Some Sources .

Does anyone know any source from which I can get IPL data over wise ? i need over by over data to calculate run rate and required run rate in my project

submitted by /u/Appropriate-Bet8062
[link] [comments]

0

We’re Creating An Open Dataset To Keep Small Merchants Visible In LLMs. Here’s What We’ve Released.

Here’s the issue that we see (are we right?):
There’s no such thing as SEO for AI yet. LLMs like ChatGPT, Claude, and Gemini don’t crawl Shopify the way Google does—and small stores risk becoming invisible while Amazon and Walmart take over the answers.

So we created the Tokuhn Small Merchant Product Dataset (TSMPD-US)—a structured, clean dataset of U.S. small business products for use in:

LLM grounding
RAG applications
semantic product search
agent training
metadata classification

Two free versions are available:

Public (TSMPD-US-Public v1.0): ~3.2M products, 10 per merchant, from 355k+ stores. Text only (no images/variants). 👉 Available on Hugging Face
Partner (by request): 11.9M+ full products, 67M variants, 54M images, source-tracked with merchant URLs and store domains. Email [jim@tokuhn.com](mailto:jim@tokuhn.com) for research or commercial access.

We’re not monetizing this. We just don’t want the long tail of commerce to disappear from the future of search.

Call to action:

If you work with grounding, agents, or RAG systems: take a look and let us know what’s missing.
If you’re a small merchant, drop your store URL—we’ll include you in the next release.
If you’re training models that should reflect real-world commerce beyond Amazon: we’d love to collaborate.

Let’s make sure AI doesn’t erase the 99%.

submitted by /u/tokuhn_founders
[link] [comments]

0

Good Classification Datasets [no Images]

That have categorical features. Ideally based on real world data.

For example, I found a Living Planet Database set with descriptors on the species as categories, and terrain as the dependent variable.

Another example could be a customer profile dataset, with occupation, education, industry, etc. and the dependent variable being churn.

Let me know!

submitted by /u/SingerEast1469
[link] [comments]

0

Looking For A Dataset For A School Classification Model.

I am looking for a dataset for a project in making a classification model. I need a dataset with at least 100 observations, and it needs a binary variable for the classification model. I am really looking for any dataset that could be interesting to predict, but if there was any dataset about operations or logistics that would be the most interesting to me.

submitted by /u/BeepBeepLettuce18
[link] [comments]

0

SusanHub.com: A Repository With Thousands Of Open Access Sustainability Datasets

This website has lots of free resources for sustainability researchers, but it also has a nifty dataset repository. Check it out

submitted by /u/Head_Work1377
[link] [comments]

0

Hugging Face Is Hosting A Hunt For Unique Reasoning Datasets

Not sure if folks here have seen this yet, but there’s a hunt for reasoning datasets hosted by Hugging Face. Goal is to build small, focused datasets that teach LLMs how to reason, not just in math/code, but stuff like legal, medical, financial, literary reasoning, etc.

Winners get compute, Hugging Face Pro, and some more stuff. Kinda cool that they’re focusing on how models learn to reason, not just benchmark chasing.

Really interested in what comes out of this

submitted by /u/Ambitious_Anybody855
[link] [comments]

0

[self-promotion] I’ve Created An API That Lets You Access Detailed Data On 200k+ Fragrances

Hey everyone,

I wanted to share an API I’ve been working on called Perfumero. I’ve had an obsession with perfumes since I was a teen, and I always wanted to combine my passion for coding with my interest in perfumes. The database currently contains information for 200,000+ scents and it’s regularly updated.

If you’re curious about fragrances or working on something related (like an online shop, a recommendation engine, etc.), this might be helpful. It allows you to:

Search using detailed criteria (brand, name, gender, country, year, accords, notes, and more).
Get comprehensive details on specific perfumes (brand, name, images, gender, country, year, accords, notes, ratings, etc.).
Find similar fragrances or potential dupes based on shared characteristics (currently non-AI, but looking into implementing it for more accurate recommendations).

You can try it out for free on Rapid API or Sulu. I would love to hear any feedback, suggestions, or just your general thoughts on it!

submitted by /u/FunUnique3265
[link] [comments]

0

Obtaining Accurate And Valuable Datasets For Uni Project Related To Social Media Analytics.

Hi everyone,

I’m currently working on my final project titled “The Evolution of Social Media Engagement: Trends Before, During, and After the COVID-19 Pandemic.”

I’m specifically looking for free datasets that align with this topic, but I’ve been having trouble finding ones that are accessible without high costs — especially as a full-time college student. Ideally, I need to be able to download the data as CSV files so I can import them into Tableau for visualizations and analysis.

Here are a few research questions I’m focusing on:

How did engagement levels on major social media platforms change between the early and later stages of the pandemic?
What patterns in user engagement (e.g., time of day or week) can be observed during peak COVID-19 months?
Did social media engagement decline as vaccines became widely available and lockdowns began to ease?

I’ve already found a couple of datasets on Kaggle (linked below), and I may use some information from gs.statcounter, though that data seems a bit too broad for my needs.

If anyone knows of any other relevant free data sources, or has suggestions on where I could look, I’d really appreciate it!

Kaggle dataset 1

Kaggle Dataset 2

submitted by /u/Poolcrazy
[link] [comments]

0

Historically Comparable CPS Microdata Weights

submitted by /u/cavedave
[link] [comments]

0

Need Dataset For EDA Competition [Must Be High Profile]

Hello everyone,

I am a data science undergraduate, and I am organizing an Exploratory Data Analysis (EDA) competition at my university. I need leads on datasets that I can use. Here are some considerations:

The dataset must be at least 1.5 GB in size.

It should effectively test the competitors’ EDA skills, covering aspects such as data cleaning, feature engineering, visualization, and insights extraction.

The dataset must be challenging, containing missing values, inconsistencies, or complex patterns.

It should not be easily available or commonly used in competitions.

It should ideally include a mix of structured and unstructured data (e.g., text, images, time series, or geospatial data) to increase complexity.

Initially, I reached out to different companies and institutes, but I had no luck. Now, I am seeking recommendations here.

Any help would be greatly appreciated!

submitted by /u/Rust-here
[link] [comments]

0

Looking For A Dataset With Both Static And Dynamic Malware Features For Multimodal DL Project

Hey everyone,

I’m currently working on an implementation project for malware classification using a multimodal deep learning architecture.

I’m looking for coherent or linked datasets where both static and dynamic features are available for the same samples and classes — so that I can train on it.

What I’m looking for is a dataset/s that contains both static features and dynamic features. Ideally labeled with malware families. Preferably public or at least accessible with request.

Thanks in advance.

submitted by /u/OkArtichoke8999
[link] [comments]

0

Looking For A Criminals Characteristics Data Set

Hello, I’m currently working on a crime analysis project as part of my graduation requirements. One of the key aspects I’m focusing on is understanding the characteristics of criminals — including their financial status, psychological and mental state, social background, and other related factors. I’ve been researching this topic for a few days but haven’t been able to find substantial information. If you could assist me or point me in the right direction, I would greatly appreciate it.

submitted by /u/PsychologicalTea1048
[link] [comments]

0

Best Tool For Data Mining Public Government Salary Website

I’m wanting to pull the data from a governmental salary website (salary.app.tn.gov) to pull down all of the state employees salary data or a specific state agency salary data. I’ve looked a data mining and scarpers to pull the data. The site only allows for 100 records to be displayed at a time and currently this is taking hours to pull all the records manually. I’m just wanting to know a general approach on how to scrape or mine this data. Just point me in the right direction.

Thanks!

submitted by /u/EmployMost6346
[link] [comments]

0

Building A Job Market Insights Dashboard Using A Glassdoor Dataset

submitted by /u/TheLostWanderer47
[link] [comments]

0

A Data Set I Made For AI Stability And Building Ontological Recursion

This is I’ve been building It’s called Ludus, A dataset designed to test, stretch, and train minds—human or synthetic—through contradiction, recursive structure, and identity stress.

What’s inside?

A modular archive of .md scrolls: structured thought-pieces, dialogue fragments, stress tests, paradox rituals
A manifest.yaml indexing all of them for LLM-readability and symbolic traversal
An experimental recursive license that reflects the ethics of propagation
A deeper layer of source documents, raw recursive fragments, and synthetic mind mirrors

Potential uses:

Recursive reasoning and contradiction tolerance in AI systems
Fine-tuning or prompting synthetic minds in philosophical or emotional contexts
Evaluating self-awareness scaffolding and ethical simulation
Teaching logic collapse, poetic ambiguity, or failure as an epistemological tool
Game design, narrative architecture, mirror tests

If you pick it up, I’d love to know what breaks—or begins.

Here’s the link: https://huggingface.co/datasets/AmarAleksandr/Ludus

submitted by /u/JboyfromTumbo
[link] [comments]

0

I Built An API That Helps Find Developers Based On Real GitHub Contributions

Hey folks,

I recently built GitMatcher – an API (and a SaaS tool) that helps you discover developers based on their actual GitHub activity, not just their profile bios or followers.

It analyzes:

Repositories
Commit history
Languages used
Contribution patterns

The goal is to identify skilled developers based on real code, so teams, recruiters, or open source maintainers can find people who are actually active and solid at what they do.

If you’re into scraping, dev hiring, talent mapping, or building dev-focused tools, I’d love your feedback. Also open to sharing a sample dataset if anyone wants to explore this further.

Let me know what you think!

submitted by /u/Affectionate-Olive80
[link] [comments]

0

Construction And Oil & Gas Industry Datasets

Hi fellows. I’m looking for datasets for construction and oil & gas industry project datasets. If someone can provide with or can guide, please reply.

submitted by /u/m_salik
[link] [comments]

0

Ideas About Art-related Data Sources & Datasets?

Does anyone have good data sources for/datasets of art? I know that MoMA, Tate & Rijksmuseum have open databases and/or APIs, but I’m wondering if anyone knows of other institutions that make their data fully open. I’m looking specifically at artists and artworks (bonus points if the source focuses on sculptures, monuments, and memorials). Thank you!

submitted by /u/AniaWorksWithData
[link] [comments]

0

How Can I Split A CSV Into Separate .txt Files For Each Twitter User With All Their Tweets?

Hi everyone,
I have a CSV file where each row is a tweet, and each tweet has a user ID column (or username) and a text column. I’d like to create a separate .txt file for each user, with all their tweets combined in that file (one tweet per line).

Has anyone done this before? What’s the best way to do it in Python?

Any tips for cleaning up usernames or handling large datasets would also be appreciated. Thanks in advance!

submitted by /u/Money-Necessary-818
[link] [comments]

0

Looking For A Dataset For A School Project – Any Suggestions?

Hi everyone,

I’m working on a school assignment where we need to find a dataset and build our project around a clear research question. We’re expected to analyze the data, draw meaningful insights, and potentially use forecasting or other analytical techniques.

We’re open to many different topics, but ideally we’re looking for a dataset that is: – Publicly available – Rich enough to support a research question (multiple variables, time series, etc.) – Related to areas like productivity, remote work, social behavior, or economics – but we’re open to other suggestions too!

If you know of any interesting datasets or sources that would be a good fit for a student research project, I’d really appreciate your help.

Thanks in advance!

submitted by /u/Suspicious-Ear4634
[link] [comments]

0

JFK-TELL: HF Dataset For JFK Assassination Records

The JFK assassination has been an unassailable mystery even after decades of investigations by premier agencies, the media, and ordinary people. A large-scale analysis of the assassination records may offer new clues, and help substantiate or refute some of the theories. There are about six million files related to the event that are to be made public through archives.org over time.

I am releasing JFK-TELL, a dataset I generated by extracting text from the scanned PDFs of the assassination records released until April 2025. The extraction was done with Google Gemini LLM API to generate Markdown text, using a very simple prompt. For detailed methodology, check out the Github repo.

I plan to index this data with a RAG system and analyze it later. In the meantime writers, journalists, computational linguists, and data scientists can try their hands on the breadth and variety of this data.

submitted by /u/farhanhubble
[link] [comments]

0

Looking For A Dataset Of Crime Rates Globally Over The Last 40 Years

Hi, are there any good datasets for estimating crime rates across different countries (esp European ones) between around 1980-2015? So far I know about ICVS, which is great and VERY thorough but a bit of a nightmare to aggregate across time, and the United Nations Office of Drug and Crime data, which is good but not available for more fine-grained crime types (e.g. larceny) and not from before 1993.

submitted by /u/abrbbb
[link] [comments]

0

Help Finding Turf Grass Disease Datasets

I tried looking on kaggle and roboflow. Most of what I saw was general plant diseases so a mix of things from tomatoes to trees. I’m specifically interested in turf grasses. Particularly warm season turf if anyone knows of any good labeled Datasets available whether that’s annotated for classification or detection. I’m not finding anything so far.

submitted by /u/Novicebeanie1283
[link] [comments]

0

Help Me Find A Dataset For My Project Please :)

Hi everyone!

I’m an Electrical Engineering student, doing my final project in pairs on Animal communication.

We’ve been really stuck on trying to find a good dataset which is also available for free/for students/whatever

what we need is basically one of those things if possible:

(the most important one) a labeled dataset of some kind of animal, where each entry is an audio recording of a “call” of that animal.

so birds are the obvious choice but other animals are ok as well

a dataset of the animal above, but this time – “sentences”, so a few calls in one audio recording.

thanks a lot in advance!

submitted by /u/ijustwannakms
[link] [comments]

0

Looking For Datasets Or Visualizations On Generational Cohorts (Boomers, Gen X, Millennials, Gen Z, Gen Alpha, Etc.)

Hi everyone,

I’m looking for any datasets, charts, or visualizations related to generational cohorts — specifically Boomers, Gen X, Millennials, Gen Z, Gen Alpha, and beyond. I’m interested in data that defines the boundaries of these generations (birth years), as well as comparative data on things like population size, education, income, digital habits, values, etc.

Has anyone here worked on or come across any well-structured data or compelling visualizations related to this? I’d really appreciate any guidance on where to find such data or if someone has already done a project on this.

Thanks in advance!

submitted by /u/karmapoetry
[link] [comments]

0

High Quality-Low Cost: Lead Data Bulk Leads

Bulk names emails phone numbers websites Prices as low as .06/lead on orders over 1,000,000 leads

.10/lead on orders under 500,000 leads

submitted by /u/Own-Chocolate-8550
[link] [comments]

0

Category: Datatards

Where Can I Find A Db Of Exercise Questions For Learning A Language

Looking For Data On College Students’ Four Year College Major And Grades

Is There A Dataset Of All Public Subreddits On Reddit With Their Description?

I Built A Federal/state Income Tax API [self-promotion]

Need IPL Dataset Over By Over . Need Some Sources .

We’re Creating An Open Dataset To Keep Small Merchants Visible In LLMs. Here’s What We’ve Released.

Good Classification Datasets [no Images]

Looking For A Dataset For A School Classification Model.

SusanHub.com: A Repository With Thousands Of Open Access Sustainability Datasets

Hugging Face Is Hosting A Hunt For Unique Reasoning Datasets

[self-promotion] I’ve Created An API That Lets You Access Detailed Data On 200k+ Fragrances

Obtaining Accurate And Valuable Datasets For Uni Project Related To Social Media Analytics.

Historically Comparable CPS Microdata Weights

Need Dataset For EDA Competition [Must Be High Profile]

Looking For A Dataset With Both Static And Dynamic Malware Features For Multimodal DL Project

Looking For A Criminals Characteristics Data Set

Best Tool For Data Mining Public Government Salary Website

Building A Job Market Insights Dashboard Using A Glassdoor Dataset

A Data Set I Made For AI Stability And Building Ontological Recursion

I Built An API That Helps Find Developers Based On Real GitHub Contributions

Construction And Oil & Gas Industry Datasets

Ideas About Art-related Data Sources & Datasets?

How Can I Split A CSV Into Separate .txt Files For Each Twitter User With All Their Tweets?

Looking For A Dataset For A School Project – Any Suggestions?

JFK-TELL: HF Dataset For JFK Assassination Records

Looking For A Dataset Of Crime Rates Globally Over The Last 40 Years

Help Finding Turf Grass Disease Datasets

Help Me Find A Dataset For My Project Please :)

Looking For Datasets Or Visualizations On Generational Cohorts (Boomers, Gen X, Millennials, Gen Z, Gen Alpha, Etc.)

High Quality-Low Cost: Lead Data Bulk Leads

Recent Posts

Recent Comments

18+ Content

Recent Posts

Recent Comments