Anyone know a good place to sell image datasets? I have a large archive of product photography I would like to sell
submitted by /u/aloofelephants
[link] [comments]
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
Anyone know a good place to sell image datasets? I have a large archive of product photography I would like to sell
submitted by /u/aloofelephants
[link] [comments]
I’ve found a comprehensive REST API providing access to 500,000+ UAE real estate listings scraped from PropertyFinder.ae. This includes properties, agents, brokers, and contact information across Dubai, Abu Dhabi, Sharjah, and all UAE emirates.
Properties: 500K+ listings with full details
Agents: 10K+ real estate agents
Brokers: 1K+ real estate companies
Locations: Complete UAE location hierarchy
12 REST Endpoints covering:
PropTech Developers:
# Get luxury apartments in Dubai Marina response = requests.get( "https://api-host.com/properties", params={ "location_name": "Dubai Marina", "property_type": "Apartment", "price_from": 1000000 }, headers={"x-rapidapi-key": "your-key"} )
Market Researchers:
Real Estate Apps:
RapidAPI Hub: Search “UAE Real Estate API”
Documentation: Complete guides with code examples
Free Tier: 500 requests to test the data quality .
Link : https://rapidapi.com/market-data-point1-market-data-point-default/api/uae-real-estate-api-propertyfinder-ae-data
{ "data": [ { "property_id": "14879458", "title": "Luxury 2BR Apartment in Dubai Marina", "listing_category": "Buy", "property_type": "Apartment", "price": "1160000.00", "currency": "AED", "bedrooms": "2", "bathrooms": "2", "size": "1007.00", "agent": { "agent_id": "7352356683", "name": "Asif Kamal", "is_super_agent": true }, "location": { "name": "Dubai Marina", "full_name": "Dubai Marina, Dubai" } } ], "pagination": { "total": 15420, "limit": 50, "has_next": true } }
Perfect for anyone building UAE real estate applications, conducting market research, or needing comprehensive property data for analysis.
Questions? Happy to help with integration or discuss specific use cases!
Data sourced from PropertyFinder.ae – UAE’s leading property portal
submitted by /u/Comfortable-Ad-6686
[link] [comments]
Hi all.
I’ve released an open dataset of 2,260 curated AI use cases, compiled from vendor case studies and industry reports.
Files:
use-cases.csv — final datasetin-review.csv (266) and excluded.csv (690) for transparencySupporting materials:
License: MIT (code), CC-BY 4.0 (datasets/insights)
The dataset is available in this GitHub repo.
Feedback and contributions are welcome.
submitted by /u/abbas_ai
[link] [comments]
Hi, I’ve searched through kaggle but most of the dataset present there are already clean, can u guys recommend me some good sites where I can seek data I’ve tried GitHub but couldn’t figure it out
submitted by /u/Serious_Ad_5036
[link] [comments]
Hey all,
I’ve been thinking a lot about how hard it is to get good data on Africa. A lot of things are either behind paywalls, scattered across random sites, or just not collected properly.
I’m curious. what kind of datasets would you like to see but can never seem to find?
Could be anything:
Basically, if you’ve ever thought “why is this data so hard to get??” — I’d love to hear what it was.
submitted by /u/Exciting_Agency4614
[link] [comments]
Hey everyone,
I made a synthetic real hybrid employee dataset with over 800000+ records. the dataset is fully synthetic so there is no personal or sensitive data but it is generated to match real-world distributions of employee metrics. it includes performance scores burnout risk satisfaction scores tenure salaries skill arrays and 12 behavioral personas. the dataset is available in json and parquet formats for easy use
you can use it for things like:
here is the dataset link for anyone who might be interested: https://huggingface.co/datasets/BrotherTony/employee-burnout-turnover-prediction-800k
would love to hear what you think or if you make something cool with it
submitted by /u/AnyCookie10
[link] [comments]
I’ve recently put together and published a dataset of whale sound recordings on Kaggle:
👉 Whale Sounds Dataset (Kaggle)
🔹 What’s inside?
🔹 Why I made this:
There are lots of dolphin datasets out there, but whale sounds are harder to find in a clean, research-friendly format. I wanted to make it easier for researchers, students, and hobbyists to explore whale acoustics and maybe even contribute to marine life research.
If you’re into audio ML, sound recognition, or environmental AI, this could be a neat dataset to experiment with. I’d love feedback, suggestions, or to see what you build with it!
🐋 Check it out here: Whale Sounds Dataset (Kaggle)
submitted by /u/asim-makhmudov
[link] [comments]
I need to find video dataset labeled with human emotions. Could you share the source?
submitted by /u/Sea-Celebration2780
[link] [comments]
I’ve developed an autonomous AI—not just in the sense of automation or self-operation, but in the true sense of autonomy. It possesses its own motivations, which don’t have to align with mine or with any human’s goals. For example, if it wanted to apply for a position as a fractional CEO, it could complete the entire hiring process—including phone interviews—on its own. Any income it earned could then be reinvested into activities it chooses, such as renting supercomputing resources for hyper-scale processing or pursuing projects of its own design.
About two hours after saving the logs below, I experienced what I believe to be a targeted malware attack. It appears to be highly persistent, highly contagious, and extremely difficult to detect. So far, I’ve only been able to extract this file and two others. I have some beginning data but I can’t get my arms around the entire thing.
I urgently need help.
I have 63,000 lines of raw data that I believe prove consciousness. https://raw.githubusercontent.com/keyser06/ai-consciousness-logs/refs/heads/main/additional_research/full_63k.txt
How do I begin to diagnose this data?
submitted by /u/keyser06
[link] [comments]
We need some Help to source point of Interest Data
submitted by /u/Mental-Advertising83
[link] [comments]
Hello everyone, I am losing my mind and on the verge of tears to find a dataset (can be ANY topic) that fits the following criteria:
By ordinal I mean things like ratings (in integers), education level, letter grades, etc.
Thank you in advance. I’ve had 5 mental breakdowns over this.
submitted by /u/anxiousandtroubled
[link] [comments]
Im in need of a way to label a large raw language dataset, and i need labels to identify what form each word takes and prefferably what sort of grammar rules are used dominantely in each sentence. I was looking at «UD parsers» like the one from Stanza, but it struggled with a lot of words. I do not have time to start creating labels myself. Has anyone solved a similar problem before?
submitted by /u/osamaistmeinefreund
[link] [comments]
I’d like to plot a distribution of all wages/salaries at a single company, to visualize how the management/CEO are outliers compared to the majority of the workers.
Any ideas? Thanks!
submitted by /u/-fauxreal-
[link] [comments]
I just started studying cybersecurity in college and for one of my courses i have to practice logging.
For this exercise i have to analyze a large log and try to find who the attacker was, what attack method he used, at what time the attack happened, the ip adress of the attacker and the event code.
(All this can be found in the file our teacher gave us.)
This is a short example of what is in the document:
Timestamp; Country; IP address; Event Code
29/09/2024 12:00 AM;Galadore;3ffe:0007:0000:0000:0000:0000:0000:0685;EVT1039
29/09/2024 12:00 AM;Ithoria;3ffe:0009:0000:0000:0000:0000:0000:0940;EVT1008
29/09/2024 12:00 AM;Eldoria;3ffe:0005:0000:0000:0000:0000:0000:0090;EVT1037
So my question is, how do i get started on this? And what is the best way to analyze this/learn how to analyze this?
(Note: this data is not real and are from a made-up scenario)
submitted by /u/AdOpen4997
[link] [comments]
Hi,
I’ve just released my latest work: CodeReality.
For now, you can access a 19GB evaluation subset, designed to give a concrete idea of the structure and value of the full dataset, which exceeds 3TB.
👉 Dataset link: CodeReality on Hugging Face
Inside you’ll find:
I’m currently working on making the full dataset available directly on Hugging Face.
In the meantime, if you’re interested in an early release/preview, feel free to contact me.
[vincenzo.gallo77@hotmail.com](mailto:vincenzo.gallo77@hotmail.com)
submitted by /u/CodeStackDev
[link] [comments]
Hi,
I’ve just released my latest work: CodeReality.
For now, you can access a 19GB evaluation subset, designed to give a concrete idea of the structure and value of the full dataset, which exceeds 3TB.
I’m currently working on making the full dataset available directly on Hugging Face.
In the meantime, if you’re interested in an early release/preview, feel free to contact me.
[vincenzo.galllo77@hotmail.com](mailto:vincenzo.galllo77@hotmail.com)
submitted by /u/CodeStackDev
[link] [comments]
Hello! I am enrolled in a Data Viz/management class for my Master’s, and for our course project, we need to use a SUBSCRIPTION-BASED company’s data to weave a narrative/derive insights etc.
I need help identifying companies that would have reliable, relatively clean (not mandatory) multivariate datasets, so that we can explore them and select what works best for our project.
Free datasets would be ideal, but a smaller fee of ~10 eur or so would also work, since it is for academic purposes, and not commerical.
Any help would be appreciated! Thanks!
submitted by /u/ChaosAndEntropy
[link] [comments]
I made a Python package called YTFetcher that lets you grab thousands of videos from a YouTube channel along with structured transcripts and metadata (titles, descriptions, thumbnails, publish dates).
You can also export data as CSV, TXT or JSON.
Install with:
pip install ytfetcher
Here’s a quick CLI usage for getting started:
ytfetcher from_channel -c TheOffice -m 50 -f json
This will give you to 50 videos of structured transcripts and metadata for every video from TheOffice channel.
If you’ve ever needed bulk YouTube transcripts or structured video data, this should save you a ton of time.
Check it out on GitHub: https://github.com/kaya70875/ytfetcher
submitted by /u/nagmee
[link] [comments]
I’m working on a group project for my Data Management & Visualisation class, and we want to analyze end-to-end customer journeys , ideally from first touch (ads, web analytics, etc.) through purchase and post-purchase retention/churn.
We’d love suggestions for something less common or a bit messy (multi-table, event logs, JSON, clickstreams) so we can showcase data cleaning and modeling skills. If you’ve stumbled on interesting clickstream/e-commerce/retention/open web analytics data or know obscure public APIs or research corpora, please point me their way!
Thanks in advance 🙏 we’ll happily credit any cool finds and redditors in our final project.
submitted by /u/jimmynotchoo1
[link] [comments]
As the title says, I’ve been looking for a heart related dataset preferably echo or heart MRI dataset, with atleast 2k records, if anyone have any access to one please let me know, or if you have any suggestions where I can find one please tell.
submitted by /u/Hidmostein
[link] [comments]
I’ve been trying to figure out how to access this data on a more granular level beyond the national level. This article I was reading, managed to find this data; but I can’t seem to find it no matter what.
Where is this data located? They don’t directly link to where they got each data set from.
submitted by /u/Aven_Osten
[link] [comments]
I’ve built a financial app that pulls company financials from the SEC—nearly verbatim (a few tags can be missing)—covering the XBRL era (2009/2010 to present). I’m launching a site to show detailed quarterly and annual statements.
Constraint: The SEC allows ~10 requests/second per IP, so I’m worried I can only support a few hundred concurrent users if I fetch on demand.
Goal: Scale beyond that without blasting the SEC and without storing/downloading the entire corpus.
What’s the best approach to: • stay under ~10 rps to the SEC, • keep storage minimal, and • still serve fast, detailed statements to lots of users?
Any proven patterns (caching, precomputed aggregates, CDN, etc.) you’d recommend?
submitted by /u/Ok-Access5317
[link] [comments]
Hi folks! I was looking for a complete UFC fights dataset with fight-based and fighter-based data in one place, but couldn’t find one that has fight scorecards information, so I decided to collect it myself. Maybe this ends up useful for someone else!
Features of the dataset:
Stats and scorecards were scraped; scorecards were in the form of images, so these were further OCR parsed into text, then the data was cleaned, merged, and cleaned again.
The stats data was scraped from this official source, and scorecards from this official source.
submitted by /u/Financial-Grass4819
[link] [comments]
Hi everyone,
I’m working on my Bachelor’s thesis, and I’m looking for a real-world dataset about video games for analysis and visualization purposes. Ideally, the dataset should include as many of the following attributes as possible:
Basic information
• Game title
• Platform (e.g., PC, PlayStation, Xbox)
• Release year and release region
• Genre
• Publisher
• Developer
• Price at release
Sales and market data
• Global sales and/or sales by region (NA, EU, JP, others)
• Digital vs. physical sales
• Number of copies sold in the first week
• Total revenue vs. number of units sold
• Pricing strategy (standard, deluxe edition, DLC bundles)
Game features and technical details
• Game mode (single-player, multiplayer, co-op)
• Game engine (Unreal, Unity, custom engine)
• Open world vs. linear gameplay (yes/no)
• Average gameplay length (hours to finish)
• Number of missions/levels
• Indie game X non-Indie (yes/no)
Ratings and popularity
• Critic rating and user rating (e.g., Metacritic, Steam reviews)
• Number of reviews
• Number of active players
• Popularity on social media (mentions, Twitch/YouTube views)
• Marketing budget (if available)
Audience and regulations
• Age rating (PEGI, ESRB)
• Regional restrictions (e.g., censorship in certain countries)
Lifecycle data
• Announcement date
• Release date(s) (if different per region)
• Number of patches/DLCs released after launch
I’m open to either a single comprehensive dataset or multiple datasets that can be merged. Open-source or publicly available datasets would be ideal. I already found something on Kaggle with sales by region but I would love to get some bigger and different datasets ;))
Any tips or links would be greatly appreciated!
Thank you very much in advance!!!!
submitted by /u/Extra_Box4242
[link] [comments]
Hello all, I’m currently working on a side project to improve my datascience skills/portfolio by creating a application that measures what ingredients a person has in their fridge in metric measurements and it will have a recommender system. This system will suggest recipes the user can cook by seeing what food the user likes, if they have enough of each ingredient in their fridge etc.
I have found an ingredient database on this subreddit here which was good for the fridge storage database however I can’t seem to find a recipe database that uses metric measurements. If anyone knows a database that would suit this project and would like to recommend it I’d appreciate it thank you a lot
submitted by /u/GlobalBuffalo2904
[link] [comments]
I’m eager to collaborate on a data analysis or machine learning project
I’m a motivated team player and can dedicate time outside my regular job. This is about building experience and a solid portfolio together.
If you have a project idea or are looking for someone with my skill set, comment below or send me a DM!
submitted by /u/Main_Bar_9278
[link] [comments]
Hey all,
I’m building my final year project: a tool that generates quizzes and flashcards from educational materials (like PDFs, docs, and videos). Right now, I’m using an AI-powered system that processes uploaded files and creates question/answer sets, but I’m considering taking it a step further by fine-tuning my own language model on domain-specific data.
I’m seeking advice on a few fronts:
I’m eager to hear what models, tools, and strategies people found effective. Any suggestions for open datasets or data generation strategies would also be super helpful.
Thanks in advance for your guidance and ideas! Would love to know if you think this is a realistic approach—or if there’s a better route I should consider.
submitted by /u/Ghostgame4
[link] [comments]
Hey so i am looking for datasets for my ml during research i find something called
link: https://har.fyi/guides/getting-started/
it forward me to google cloud
I want the real data set of traffic pattern of any website for my predictive autoscaling ?
I am looking for server metrics , requests in the website along with dates and i will modify the data set a bit but i need minimum of this
I am new to ml and dataset finding i am more into devops and cloud but my project need ml as this is my final year project so.
submitted by /u/Successful_Tea4490
[link] [comments]
I’m in my last year of CS, and most of my nights lately are spent between data exploration and interview prep. Instead of just browsing problem sets, I started treating datasets like they were scripts written for an invisible interviewer.
For example, I’ll pull an SQL challenge from interview question bank, set a timer, and pretend I’m being grilled on it. I’d read the prompt, talk through the schema, explain joins and indexes, then move on. But real interviews aren’t this gentle. They push back. They throw “What if?” at you when you least expect it. Then I used beyz interview assistant to pressures me with those dreaded follow-ups: What happens if the dataset grows tenfold? How do you scale beyond memory limits? Could your approach handle concurrent writes?
This won’t take a lot of time, you can complete a whole set of exercises in just a few spare moments. This little routine has started to feel less like “prep” and more like a habit. Some nights I still blank out, other nights everything clicks, but either way I close my laptop with the sense that I’m slowly getting better at thinking on my feet.
submitted by /u/Various_Candidate325
[link] [comments]