submitted by /u/phicreative1997
[link] [comments]
Category: Datatards
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
A Python Package for Alibaba Data Extraction
I’m excited to share my recently developed Python package, aba-cli-scrapper (https://github.com/poneoneo/Alibaba-CLI-Scrapper), designed to facilitate data extraction from Alibaba. This command-line tool enables users to build a comprehensive dataset containing valuable information on products and suppliers associated with the platform. The extracted data can be stored in either a MySQL or SQLite database, with the option to convert it into CSV files from the SQLite file.
Key Features:
Asynchronous mode for faster scraping of page results using Bright-Data API key (configuration required)
Synchronous mode available for users without an API key (note: proxy limitations may apply)
Supports data storage in MySQL or SQLite databases
Converts data to CSV files from SQLite database
Seeking Feedback and Contributions:
I’d love to hear your thoughts on this project and encourage you to test it out. Your feedback and suggestions on the package’s usefulness and potential evolution are invaluable. Future plans include adding a RAG (Red, Amber, Green) feature to enhance database interactions.
Feel free to try out aba-cli-scrapper and share your experiences.
submitted by /u/7_hole
[link] [comments]
Hi! Just like the title says, I would love to find some big datasats of images of different kinds of road signs. Google images takes way too long.
submitted by /u/dinno8
[link] [comments]
Hi, we’re Datagen (https://datagen.dev/) , a dataset engine designed to simplify your dataset creation process. We’re currently in an early phase, primarily using only open web sources, but we’re continuously expanding our data source. We want to grow alongside the community by understanding which data collection problems are most pressing.
Creating a dataset with Datagen is a simple two-step process:
Define the data you want to find Provide details of the data you want to include in the dataset
Datagen then handles the extraction and preparation of all necessary data for you.
It’s totally free to use right now with data row limitations while we are in beta. We’re all about making Datagen the tool that helps, and that means listening to what you need. So, if you’ve ever struggled to build a dataset, or if you have any ideas on how we can improve, we’d love to hear from you!
Disclaimer: I am the creator of Datagen., Feel free to ask me anything about Datagen!
submitted by /u/AccurateSuggestion54
[link] [comments]
I only could get 5K pics from Kaggle but most of those pictures are of cars, i need pictures of two wheelers
submitted by /u/rszdev
[link] [comments]
Hi there, I have been searching google for a Zipcode database for the US, but I’m not sure which one to go with? Any suggestions?
Thx
submitted by /u/OddNMacabre
[link] [comments]
My guess has been that people are answering the survey question with multiple ranked answers, but I’m second-guessing this. If this is the case, how would I word a summary of such information. Ex. “40% of people learn about new destinations from travel websites, 27% from Youtube, and 27% from TripAdvisor.”
Source preview: https://tgmresearch.com/travel-survey-insights-in-spain.html
submitted by /u/_pieman
[link] [comments]
I am looking for a good dataset that provides user subscription data for forecasting. Ideally something with more than 20K users with 3+ years of data if monthly subscriptions or 4+ years of data if annual subscriptions. Could be a mix of both too in the dataset.
submitted by /u/get_ekeD
[link] [comments]
Reomnify is a cloud-based data platform that empowers businesses with high-quality, curated datasets across various industries. We leverage cutting-edge AI to transform fragmented data sources into clean, actionable insights. Our platform offers unparalleled speed, scale, and accuracy, enabling you to make data-driven decisions with confidence.
Key Features of Reomnify
Data Aggregation: Reomnify collects data from tens of thousands of online and offline sources, enabling it to create comprehensive datasets. This process includes cleaning, deduplication, and standardization to ensure data quality. Customizable Datasets: The platform allows for bespoke dataset creation tailored to specific client needs, ensuring maximum value with minimal integration effort. Clients can specify data attributes, enhancements, and formats. Speed and Flexibility: Built on Google Cloud, Reomnify’s agile platform can deliver customized datasets within days or weeks, depending on client requirements. Cost Efficiency: Reomnify aims to provide affordable data solutions, offering significant savings in both time and costs compared to traditional data sourcing methods. Clients can save up to 89% in time and 61% in costs. Monthly Updates: The platform offers regularly updated data, particularly useful for businesses that require the latest information for decision-making.
Types of Property Data Offered by Reomnify
Reomnify provides a variety of property-related datasets, which include:
Retail Location Data: Information on over 1,000 high-street brands, including detailed store locations and categories, useful for competitor analysis and trade area assessments. Shopping Center Data: Tenant lists and dynamics of shopping centers, updated monthly to assist in leasing strategies and market analysis. Restaurant and Cafe Data: Monthly updates on restaurant locations, competitor analysis, and neighborhood insights, enabling businesses to stay competitive in the food service industry. Geospatial Data: Comprehensive datasets that support various analyses, including residential real estate strategies, pricing strategies, and marketing insights. Alternative Data: Unique datasets that can provide additional context and insights for businesses looking to enhance their data-driven decisions.
Overall, Reomnify’s platform is designed to empower businesses by providing reliable, high-quality data that facilitates informed decision-making in a rapidly changing market environment.
submitted by /u/Cultural-Antelope758
[link] [comments]
Does anybody know if there exists any dataset that contains full HTML pages with elements (such as header, sidebar, footer, home button, etc) labelled? Or maybe just the element labelled and not the full HTML?
Worst case scenario I have to scrape html pages myself and manually label all the elements myself but I can’t even imagine how much time it would take to get something like 10, 000 examples of that..
Tysm in advance!
submitted by /u/Personal_Concept8169
[link] [comments]
Webz.io created the free News API Lite so students, developers, and researchers could easily incorporate high-quality, relevant news information into their non-commercial projects. The API gives you limited access to Webz.io vast repository of global news content, including up to 30 days of historical news data. It also includes advanced search capabilities so you can quickly refine and target your news data searches. With access to relevant and timely news data, you can discover trends and analyze sentiment. You can build innovative applications and dashboards powered by news data.
submitted by /u/rangeva
[link] [comments]
Hi, I’m trying to get historical data on the Olympics (Not just medals. I’d like data from Round of 16/32, qualifying rounds etc. for specific sports). I tried looking at the Olympic Data Feed, but all I see is the data dictionary. Any idea how I can get the actual data?
Also open to alternate suggestions on how to get my hands on the Olympics dataset. Thanks everyone!
submitted by /u/thevarunfactor
[link] [comments]
Is there a klimadashboard.org style data visualization for deportation data?
submitted by /u/ippon1
[link] [comments]
Help decoding file names Example. I want to see if a file name aligns with a time / date in which the photos were taken to find out if they were sent just after they were taken. Generally a device has a sequence in which it labels like MMYYDDHM.JPG.
The metadata from these files is stripped.We only have the names to go off of. The photos were taken on a 2015-2017 LG model android phone with metro pcs. Maybe a g70.
10206299612608799.jpg, 10206299612768803.jpg, 10206299612888806.jpg
Some context, the photos are all of the same object at what appears to be taken in a sequence.
The last part of the file name is the only part that changes.
The only data I have is the date that they were potentially taken to compare. Date: 09/24/17.
Other files i have for comparison
10219120178074923.jpg was taken on or around june 9 2017
10219114070362234.jpg was taken on or around may 17 2017
10219138304288067.jpg was taken on or around aug 13 2017
10219137616550874.jpg was taken on or around aug 5 2017
Anyone able to determine when the three i listed above were taken?
submitted by /u/Upsidedown_Desk82920
[link] [comments]
Hello, world! I’m trying to get the NYC subway origin/destination datasets (https://data.ny.gov/Transportation/MTA-Subway-Origin-Destination-Ridership-Estimate-2/uhf3-t34z/about_data) for what they have available, which is 2023 and up to the previous month in this current year. I’m having a heck of a time trying to download it so I can play with it, though. Exporting the whole thing to CSV seems to take forever, errors out often, and when I do get a file, it ends with an error part of the way through. Anyone have any ideas on how I can get at the raw dataset in a better way?
submitted by /u/Witty_Garlic_1591
[link] [comments]
I am working on my final year project and am in huge need of a recipe and food image dataset. If anyone has any information please help your pal out!
submitted by /u/Ok_Professional9230
[link] [comments]
Hey all,
I am trying to do some type of end to end machine learning project where I use a cloud platform to schedule model retraining and use MLFlow to keep track of the retrained models and a dashboard that shows how the model is performing that updates each time the model is retrained. I’ve been trying to find a dataset that would be good for this but I’ve been having a hard time finding one that isn’t too complex but is understandable and interesting. I’m trying to do it on tabular data and I’ve checked places like AWS open data registry but a lot of them seem like it would be tough to work with potentially. Any recommendations? Thanks in advance!
submitted by /u/RimzTV
[link] [comments]
I’m super excited to share my first R package I’ve developed! It uses data from the ME_DEM project, and allows you to easily access geospatial data for mapping Tolkien’s Middle Earth and bringing it to life!
You can download the package here:
https://github.com/austinw8/MiddleEarth
In the future, I plan to add some functions that allow you to input names or regions and have it instantly mapped for you. Stay tuned 😄
Also, a huge thank you to Andrew Heiss and his blog for helping me put this together.
submitted by /u/austinw_8
[link] [comments]
Hello, I want to generate 1 million SMS text messages for testing purposes:
OTP/non-OTP (60/40 split respectively) Mix of languages (up to 40% of the total can be English) I’m thinking of using the OpenAI API here, probably a combination of assistants.
Can someone help me how i should solve this.
submitted by /u/ExpressionNo2778
[link] [comments]
I’m aware of websites which provide this data, I want to get it in a dataset.
submitted by /u/The_ZMD
[link] [comments]
Here is a fun one I collected. This is poker data from every property in Las Vegas that ran a poker tournament series during the World Series of Poker. Aria, Wynn, MGM, Venetian, Orleans, Golden Nugget, Caesars, and Resorts World. The data is fun to play around with if you know a bit about poker. I believe Rake (what the casino takes form the buyin to help pay for everything) was actually lower percent this year. How do entries in regular old No Limit Hold’em events do compared to last year. Was there are rise in mixed game attendance?
Have fun with it.
submitted by /u/thriftbin
[link] [comments]
I’ve just launched an exciting 6-week challenge that gives you access to real social media datasets. It’s a great opportunity to work with interesting data and potentially win big!
What’s involved:
Access and analyze real social media datasets Use professional tools: Paradime (SQL/dbt™), MotherDuck (data warehouse), Hex (visualization) Chance to win: $3000 (1st), $2000 (2nd), $1000 (3rd) in Amazon gift cards
My partners and I have invested in creating a valuable learning experience with industry-standard tools and real-world datasets. You’ll get hands-on practice with professional technologies and interesting data. Rest assured, your work remains your own – we won’t be using your code, selling your information, or contacting you without consent. This competition is all about giving you a chance to work with and derive insights from real social media datasets.
Concerned about time? No worries, the challenge submissions aren’t due until September 9th. Even 5 hours of your time could put you in the running, but feel free to dive deeper!
Interested? Register here: https://www.paradime.io/dbt-data-modeling-challenge
submitted by /u/JParkerRogers
[link] [comments]
Looking for ISO certifications by company as well as more specific certifications for aerospace or the like (AS9100, 9110, 9120, etc). Some .org’s exist but wondering if there’s more of a public-facing database that has most of them.
submitted by /u/Ben2ek
[link] [comments]
I just need to check the source of following Statista report to figure out if it’s actually worth my money or not. Can someone please just tell me that?
https://www.statista.com/statistics/1227458/coffee-consumption-india/
submitted by /u/vihitk
[link] [comments]
For an analysis of the pattern of penalties awarded in each Gameweek in the English Premier League over the years (ideally at least 10 years), I am interested in match data with details of penalties such as the number of penalties allowed at the least. Please suggest. I checked Kaggle etf but cannot find penalty info
submitted by /u/voidwithAface
[link] [comments]
Right now I have been training off of foot ulcer images, which are the only wound images I have been able to find on the internet. So far I have around 3000 training examples, however I need much more if I want my model to perform at its highest degree.
submitted by /u/Resident_Ebb6083
[link] [comments]
Anonymous Risk-Free Survey Link: https://uky.az1.qualtrics.com/jfe/form/SV_dmB7vD4HQzuRgIC?Q_CHL=qr
As someone in recovery myself, I am pursuing a cognitive neuroscience PhD and I want to discover if there are familial patterns of substance use/addictive behaviors and if there is intergenerational concordance regarding substance/activity preference, age at onset, treatment-seeking, etc.
Please share your experiences to help us improve addiction prevention and intervention methods! Every response, every share, and every tag propels us closer to groundbreaking discoveries. You’re not just filling out an anonymous survey—you’re fueling a recovery revolution!
Remember: Your experience is powerful. Your voice matters. Your participation saves lives.
Thank you so much for your commitment to helping others!
submitted by /u/di6duthfiyd75w
[link] [comments]
I am looking for either a database or even better an API that allows me to use a dataset of fitness/gym exercises. The more flexible the better. For example if grouped by different categories like “chest”, “back” etc. or “equipment”, “body” etc. that would be fantastic. If it includes images as well that would be even better.
submitted by /u/MarionberryLess652
[link] [comments]