Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Looking For A Dataset With Task Descriptions, Time, And Seniority Levels – Any Suggestions?

Hi everyone,

I’m currently working on a project that requires a specific dataset type, and I’d like someone here to point me in the right direction or offer some advice.

What I need:

Task descriptions: a list of tasks or activities with explanations. Seniority levels: the seniority level (Junior, Mid, Senior) of the person who performed each task. Time taken: the factual amount of time it took to complete each task.

Where I’ve looked:

I’ve checked platforms like Kaggle, Google Datasets and some project management tools, but I haven’t found exactly what I’m looking for. I’ve also considered synthetic data generation, but I hope to find a real dataset.

Does anyone know of a dataset that fits this description? If not, any suggestions on where I might find this kind of data? Lastly, if finding a dataset is challenging, do you think web scraping could be a viable option? If so, from where?

Thanks in advance for any help or suggestions!

submitted by /u/Pretend_Cartoonist27
[link] [comments]

Just Launched: AI-Powered FragranceFinder API 🌸✨

Hi everyone,

I’m excited to share something I’ve been working on—a new AI-powered API called FragranceFinder API! 🎉

For all the data enthusiasts and developers out there, this API allows you to search through thousands of fragrances effortlessly.

Whether you’re building an app, exploring scent data, or just curious about different perfumes, this tool can help you find what you’re looking for.

Here’s what you can do with it:

Search by name, notes, or brand: Quickly locate specific fragrances or discover new ones. Get detailed information: Includes fragrance names, brands, scent notes, and even images. (The image URLs use a prefix of —just add

I’d love to hear your thoughts or feedback! If you have any questions or need help with integration, feel free to ask.

Happy scent hunting!

Best,

submitted by /u/Affectionate-Olive80
[link] [comments]

Request Your Own Data Sets From UK Supermarket Loyalty Cards

Hi guys, I developed a tool that allows you to request your data from various UK retailers. Thought you guys would appreciate being able to generate your own retailer data sets from UK grocers like Waitrose, Boots, Tescos etc.

Full disclosure, I own the site, but I don’t make money off of it, we also won’t share your data with anyone. In fact, we delete all the personal data as soon as we receive it because to us, it’s all about improving our request process. And the more users we request for, the better our relationship would be with the retailer data teams.

supermarketer.co.uk/beta

submitted by /u/SuperMarketerUK
[link] [comments]

Online Tools For Image Labeling (online Hosted Gradio)

Hi, I need to host a little site so that people from my team could all connect and label the data: more precisely, choose from two shown pictures: first picture, second picture, draw or skip. I have a vague idea of how to do this on my own PC but was wondering if there’s already an online tool for simplifying something like this. If anyone has some tips on the subject, I’d be very thankful!

submitted by /u/speedmotel
[link] [comments]

Datasets With Physical Exercises, Focused On Involved Muscles.

I’m looking for dataset with weight lifting exercises with focus on involved muscles. I don’t care for gifs, pics or training plans.

I’ve found https://github.com/yuhonas/free-exercise-db – it’s rather limited in terms of muscles involved. I’m aware of exrx.net which is quite… unfriendly license-wise or paid, although it’s pretty much perfect in terms of content quality. I found few other sources that were generally worse on both dimensions, often due to focus on visual content.

submitted by /u/teleoflexuous
[link] [comments]

Seeking Real-estate Developer Contacts

Hi all,

I’m a retail real estate investor looking to compile a list of small to mid-size retail real estate developers, specifically focused on FL, NY, NJ, TX, and GA. Ideally, I’d like to find developers with contact info like a phone number or email. Does anyone know of good databases, startups, or resources that might help? Any tips on where to look or how to go about finding this information would be greatly appreciated!

Thanks in advance!

submitted by /u/No_Way_1569
[link] [comments]

Looking For Datasets On Companies That Changed Their Logos During Pride Month

Hi all! So I’m playing around with a project on rainbow washing and was needing a dataset on companies that changed their logos online during pride month. It would pretty much be [company name] [yes/no] [year]. I’ve found one linked below for example. I’m curious if the community may know of other sources. If not, is there a manual way to hunt it down myself? Because pride month is over, all companies have already reverted their logos on social media so I won’t be able to tell. I’ve tried using wayback machine to check their social media pages during June, but it’s not showing (unless I’m doing something wrong). Thanks! https://dongou.notion.site/1f26ed07c9c84bc69c56447b9d989115?v=d8cb928e5791411cb5b86f39833d0b6d

submitted by /u/silverdrgn
[link] [comments]

A Python Package For Alibaba Data Extraction

A Python Package for Alibaba Data Extraction

I’m excited to share my recently developed Python package, aba-cli-scrapper (https://github.com/poneoneo/Alibaba-CLI-Scrapper), designed to facilitate data extraction from Alibaba. This command-line tool enables users to build a comprehensive dataset containing valuable information on products and suppliers associated with the platform. The extracted data can be stored in either a MySQL or SQLite database, with the option to convert it into CSV files from the SQLite file.

Key Features:

Asynchronous mode for faster scraping of page results using Bright-Data API key (configuration required)

Synchronous mode available for users without an API key (note: proxy limitations may apply)

Supports data storage in MySQL or SQLite databases

Converts data to CSV files from SQLite database

Seeking Feedback and Contributions:

I’d love to hear your thoughts on this project and encourage you to test it out. Your feedback and suggestions on the package’s usefulness and potential evolution are invaluable. Future plans include adding a RAG (Red, Amber, Green) feature to enhance database interactions.

Feel free to try out aba-cli-scrapper and share your experiences.

submitted by /u/7_hole
[link] [comments]

Datagen — A New Dataset Creation Engine

Hi, we’re Datagen (https://datagen.dev/) , a dataset engine designed to simplify your dataset creation process. We’re currently in an early phase, primarily using only open web sources, but we’re continuously expanding our data source. We want to grow alongside the community by understanding which data collection problems are most pressing.

Creating a dataset with Datagen is a simple two-step process:

Define the data you want to find Provide details of the data you want to include in the dataset

Datagen then handles the extraction and preparation of all necessary data for you.

It’s totally free to use right now with data row limitations while we are in beta. We’re all about making Datagen the tool that helps, and that means listening to what you need. So, if you’ve ever struggled to build a dataset, or if you have any ideas on how we can improve, we’d love to hear from you!

Disclaimer: I am the creator of Datagen., Feel free to ask me anything about Datagen!

submitted by /u/AccurateSuggestion54
[link] [comments]

Please Help Me Read This Survey Data Accurately. I Would Like To Understand Why The Percentages Don’t Add To 100%.

My guess has been that people are answering the survey question with multiple ranked answers, but I’m second-guessing this. If this is the case, how would I word a summary of such information. Ex. “40% of people learn about new destinations from travel websites, 27% from Youtube, and 27% from TripAdvisor.”

Source preview: https://tgmresearch.com/travel-survey-insights-in-spain.html

submitted by /u/_pieman
[link] [comments]

Introduction To Reomnify {reomnify.com} And Its Use Cases {self -promotion}

Reomnify is a cloud-based data platform that empowers businesses with high-quality, curated datasets across various industries. We leverage cutting-edge AI to transform fragmented data sources into clean, actionable insights. Our platform offers unparalleled speed, scale, and accuracy, enabling you to make data-driven decisions with confidence.

Key Features of Reomnify

Data Aggregation: Reomnify collects data from tens of thousands of online and offline sources, enabling it to create comprehensive datasets. This process includes cleaning, deduplication, and standardization to ensure data quality. Customizable Datasets: The platform allows for bespoke dataset creation tailored to specific client needs, ensuring maximum value with minimal integration effort. Clients can specify data attributes, enhancements, and formats. Speed and Flexibility: Built on Google Cloud, Reomnify’s agile platform can deliver customized datasets within days or weeks, depending on client requirements. Cost Efficiency: Reomnify aims to provide affordable data solutions, offering significant savings in both time and costs compared to traditional data sourcing methods. Clients can save up to 89% in time and 61% in costs. Monthly Updates: The platform offers regularly updated data, particularly useful for businesses that require the latest information for decision-making.

Types of Property Data Offered by Reomnify

Reomnify provides a variety of property-related datasets, which include:

Retail Location Data: Information on over 1,000 high-street brands, including detailed store locations and categories, useful for competitor analysis and trade area assessments. Shopping Center Data: Tenant lists and dynamics of shopping centers, updated monthly to assist in leasing strategies and market analysis. Restaurant and Cafe Data: Monthly updates on restaurant locations, competitor analysis, and neighborhood insights, enabling businesses to stay competitive in the food service industry. Geospatial Data: Comprehensive datasets that support various analyses, including residential real estate strategies, pricing strategies, and marketing insights. Alternative Data: Unique datasets that can provide additional context and insights for businesses looking to enhance their data-driven decisions.

Overall, Reomnify’s platform is designed to empower businesses by providing reliable, high-quality data that facilitates informed decision-making in a rapidly changing market environment.

submitted by /u/Cultural-Antelope758
[link] [comments]

Looking For Labelled HTML Element Dataset

Does anybody know if there exists any dataset that contains full HTML pages with elements (such as header, sidebar, footer, home button, etc) labelled? Or maybe just the element labelled and not the full HTML?

Worst case scenario I have to scrape html pages myself and manually label all the elements myself but I can’t even imagine how much time it would take to get something like 10, 000 examples of that..

Tysm in advance!

submitted by /u/Personal_Concept8169
[link] [comments]

Free Access To Global News API By Webz.io

Webz.io created the free News API Lite so students, developers, and researchers could easily incorporate high-quality, relevant news information into their non-commercial projects. The API gives you limited access to Webz.io vast repository of global news content, including up to 30 days of historical news data. It also includes advanced search capabilities so you can quickly refine and target your news data searches. With access to relevant and timely news data, you can discover trends and analyze sentiment. You can build innovative applications and dashboards powered by news data.

submitted by /u/rangeva
[link] [comments]

Help With Android File Naming, Odd Issue.

Help decoding file names Example. I want to see if a file name aligns with a time / date in which the photos were taken to find out if they were sent just after they were taken. Generally a device has a sequence in which it labels like MMYYDDHM.JPG.

The metadata from these files is stripped.We only have the names to go off of. The photos were taken on a 2015-2017 LG model android phone with metro pcs. Maybe a g70.

10206299612608799.jpg, 10206299612768803.jpg, 10206299612888806.jpg

Some context, the photos are all of the same object at what appears to be taken in a sequence.

The last part of the file name is the only part that changes.

The only data I have is the date that they were potentially taken to compare. Date: 09/24/17.

Other files i have for comparison

10219120178074923.jpg was taken on or around june 9 2017

10219114070362234.jpg was taken on or around may 17 2017

10219138304288067.jpg was taken on or around aug 13 2017

10219137616550874.jpg was taken on or around aug 5 2017

Anyone able to determine when the three i listed above were taken?

submitted by /u/Upsidedown_Desk82920
[link] [comments]

Nyc Mta Origin/destination Dataset Download Issues

Hello, world! I’m trying to get the NYC subway origin/destination datasets (https://data.ny.gov/Transportation/MTA-Subway-Origin-Destination-Ridership-Estimate-2/uhf3-t34z/about_data) for what they have available, which is 2023 and up to the previous month in this current year. I’m having a heck of a time trying to download it so I can play with it, though. Exporting the whole thing to CSV seems to take forever, errors out often, and when I do get a file, it ends with an error part of the way through. Anyone have any ideas on how I can get at the raw dataset in a better way?

submitted by /u/Witty_Garlic_1591
[link] [comments]

Updating Tabular Data For ML Project

Hey all,

I am trying to do some type of end to end machine learning project where I use a cloud platform to schedule model retraining and use MLFlow to keep track of the retrained models and a dashboard that shows how the model is performing that updates each time the model is retrained. I’ve been trying to find a dataset that would be good for this but I’ve been having a hard time finding one that isn’t too complex but is understandable and interesting. I’m trying to do it on tabular data and I’ve checked places like AWS open data registry but a lot of them seem like it would be tough to work with potentially. Any recommendations? Thanks in advance!

submitted by /u/RimzTV
[link] [comments]