Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

A Python Package For Alibaba Data Extraction

A Python Package for Alibaba Data Extraction

I’m excited to share my recently developed Python package, aba-cli-scrapper (https://github.com/poneoneo/Alibaba-CLI-Scrapper), designed to facilitate data extraction from Alibaba. This command-line tool enables users to build a comprehensive dataset containing valuable information on products and suppliers associated with the platform. The extracted data can be stored in either a MySQL or SQLite database, with the option to convert it into CSV files from the SQLite file.

Key Features:

Asynchronous mode for faster scraping of page results using Bright-Data API key (configuration required)

Synchronous mode available for users without an API key (note: proxy limitations may apply)

Supports data storage in MySQL or SQLite databases

Converts data to CSV files from SQLite database

Seeking Feedback and Contributions:

I’d love to hear your thoughts on this project and encourage you to test it out. Your feedback and suggestions on the package’s usefulness and potential evolution are invaluable. Future plans include adding a RAG (Red, Amber, Green) feature to enhance database interactions.

Feel free to try out aba-cli-scrapper and share your experiences.

submitted by /u/7_hole
[link] [comments]

Datagen — A New Dataset Creation Engine

Hi, we’re Datagen (https://datagen.dev/) , a dataset engine designed to simplify your dataset creation process. We’re currently in an early phase, primarily using only open web sources, but we’re continuously expanding our data source. We want to grow alongside the community by understanding which data collection problems are most pressing.

Creating a dataset with Datagen is a simple two-step process:

Define the data you want to find Provide details of the data you want to include in the dataset

Datagen then handles the extraction and preparation of all necessary data for you.

It’s totally free to use right now with data row limitations while we are in beta. We’re all about making Datagen the tool that helps, and that means listening to what you need. So, if you’ve ever struggled to build a dataset, or if you have any ideas on how we can improve, we’d love to hear from you!

Disclaimer: I am the creator of Datagen., Feel free to ask me anything about Datagen!

submitted by /u/AccurateSuggestion54
[link] [comments]

Please Help Me Read This Survey Data Accurately. I Would Like To Understand Why The Percentages Don’t Add To 100%.

My guess has been that people are answering the survey question with multiple ranked answers, but I’m second-guessing this. If this is the case, how would I word a summary of such information. Ex. “40% of people learn about new destinations from travel websites, 27% from Youtube, and 27% from TripAdvisor.”

Source preview: https://tgmresearch.com/travel-survey-insights-in-spain.html

submitted by /u/_pieman
[link] [comments]

Introduction To Reomnify {reomnify.com} And Its Use Cases {self -promotion}

Reomnify is a cloud-based data platform that empowers businesses with high-quality, curated datasets across various industries. We leverage cutting-edge AI to transform fragmented data sources into clean, actionable insights. Our platform offers unparalleled speed, scale, and accuracy, enabling you to make data-driven decisions with confidence.

Key Features of Reomnify

Data Aggregation: Reomnify collects data from tens of thousands of online and offline sources, enabling it to create comprehensive datasets. This process includes cleaning, deduplication, and standardization to ensure data quality. Customizable Datasets: The platform allows for bespoke dataset creation tailored to specific client needs, ensuring maximum value with minimal integration effort. Clients can specify data attributes, enhancements, and formats. Speed and Flexibility: Built on Google Cloud, Reomnify’s agile platform can deliver customized datasets within days or weeks, depending on client requirements. Cost Efficiency: Reomnify aims to provide affordable data solutions, offering significant savings in both time and costs compared to traditional data sourcing methods. Clients can save up to 89% in time and 61% in costs. Monthly Updates: The platform offers regularly updated data, particularly useful for businesses that require the latest information for decision-making.

Types of Property Data Offered by Reomnify

Reomnify provides a variety of property-related datasets, which include:

Retail Location Data: Information on over 1,000 high-street brands, including detailed store locations and categories, useful for competitor analysis and trade area assessments. Shopping Center Data: Tenant lists and dynamics of shopping centers, updated monthly to assist in leasing strategies and market analysis. Restaurant and Cafe Data: Monthly updates on restaurant locations, competitor analysis, and neighborhood insights, enabling businesses to stay competitive in the food service industry. Geospatial Data: Comprehensive datasets that support various analyses, including residential real estate strategies, pricing strategies, and marketing insights. Alternative Data: Unique datasets that can provide additional context and insights for businesses looking to enhance their data-driven decisions.

Overall, Reomnify’s platform is designed to empower businesses by providing reliable, high-quality data that facilitates informed decision-making in a rapidly changing market environment.

submitted by /u/Cultural-Antelope758
[link] [comments]

Looking For Labelled HTML Element Dataset

Does anybody know if there exists any dataset that contains full HTML pages with elements (such as header, sidebar, footer, home button, etc) labelled? Or maybe just the element labelled and not the full HTML?

Worst case scenario I have to scrape html pages myself and manually label all the elements myself but I can’t even imagine how much time it would take to get something like 10, 000 examples of that..

Tysm in advance!

submitted by /u/Personal_Concept8169
[link] [comments]

Free Access To Global News API By Webz.io

Webz.io created the free News API Lite so students, developers, and researchers could easily incorporate high-quality, relevant news information into their non-commercial projects. The API gives you limited access to Webz.io vast repository of global news content, including up to 30 days of historical news data. It also includes advanced search capabilities so you can quickly refine and target your news data searches. With access to relevant and timely news data, you can discover trends and analyze sentiment. You can build innovative applications and dashboards powered by news data.

submitted by /u/rangeva
[link] [comments]

Help With Android File Naming, Odd Issue.

Help decoding file names Example. I want to see if a file name aligns with a time / date in which the photos were taken to find out if they were sent just after they were taken. Generally a device has a sequence in which it labels like MMYYDDHM.JPG.

The metadata from these files is stripped.We only have the names to go off of. The photos were taken on a 2015-2017 LG model android phone with metro pcs. Maybe a g70.

10206299612608799.jpg, 10206299612768803.jpg, 10206299612888806.jpg

Some context, the photos are all of the same object at what appears to be taken in a sequence.

The last part of the file name is the only part that changes.

The only data I have is the date that they were potentially taken to compare. Date: 09/24/17.

Other files i have for comparison

10219120178074923.jpg was taken on or around june 9 2017

10219114070362234.jpg was taken on or around may 17 2017

10219138304288067.jpg was taken on or around aug 13 2017

10219137616550874.jpg was taken on or around aug 5 2017

Anyone able to determine when the three i listed above were taken?

submitted by /u/Upsidedown_Desk82920
[link] [comments]

Nyc Mta Origin/destination Dataset Download Issues

Hello, world! I’m trying to get the NYC subway origin/destination datasets (https://data.ny.gov/Transportation/MTA-Subway-Origin-Destination-Ridership-Estimate-2/uhf3-t34z/about_data) for what they have available, which is 2023 and up to the previous month in this current year. I’m having a heck of a time trying to download it so I can play with it, though. Exporting the whole thing to CSV seems to take forever, errors out often, and when I do get a file, it ends with an error part of the way through. Anyone have any ideas on how I can get at the raw dataset in a better way?

submitted by /u/Witty_Garlic_1591
[link] [comments]

Updating Tabular Data For ML Project

Hey all,

I am trying to do some type of end to end machine learning project where I use a cloud platform to schedule model retraining and use MLFlow to keep track of the retrained models and a dashboard that shows how the model is performing that updates each time the model is retrained. I’ve been trying to find a dataset that would be good for this but I’ve been having a hard time finding one that isn’t too complex but is understandable and interesting. I’m trying to do it on tabular data and I’ve checked places like AWS open data registry but a lot of them seem like it would be tough to work with potentially. Any recommendations? Thanks in advance!

submitted by /u/RimzTV
[link] [comments]

Mapping Tolkien’s Middle Earth With MiddleEarth R Package

I’m super excited to share my first R package I’ve developed! It uses data from the ME_DEM project, and allows you to easily access geospatial data for mapping Tolkien’s Middle Earth and bringing it to life!

You can download the package here:
https://github.com/austinw8/MiddleEarth

In the future, I plan to add some functions that allow you to input names or regions and have it instantly mapped for you. Stay tuned 😄

Also, a huge thank you to Andrew Heiss and his blog for helping me put this together.

submitted by /u/austinw_8
[link] [comments]

Summer Tournament Poker Data Around The WSOP 2023 And 2024

Here is a fun one I collected. This is poker data from every property in Las Vegas that ran a poker tournament series during the World Series of Poker. Aria, Wynn, MGM, Venetian, Orleans, Golden Nugget, Caesars, and Resorts World. The data is fun to play around with if you know a bit about poker. I believe Rake (what the casino takes form the buyin to help pay for everything) was actually lower percent this year. How do entries in regular old No Limit Hold’em events do compared to last year. Was there are rise in mixed game attendance?

Have fun with it.

https://github.com/rcs1978/summerpokerLV

submitted by /u/thriftbin
[link] [comments]

6-Week Social Media Data Challenge: Work With Real Datasets, Win Up To $3000!

I’ve just launched an exciting 6-week challenge that gives you access to real social media datasets. It’s a great opportunity to work with interesting data and potentially win big!

What’s involved:

Access and analyze real social media datasets Use professional tools: Paradime (SQL/dbt™), MotherDuck (data warehouse), Hex (visualization) Chance to win: $3000 (1st), $2000 (2nd), $1000 (3rd) in Amazon gift cards

My partners and I have invested in creating a valuable learning experience with industry-standard tools and real-world datasets. You’ll get hands-on practice with professional technologies and interesting data. Rest assured, your work remains your own – we won’t be using your code, selling your information, or contacting you without consent. This competition is all about giving you a chance to work with and derive insights from real social media datasets.

Concerned about time? No worries, the challenge submissions aren’t due until September 9th. Even 5 hours of your time could put you in the running, but feel free to dive deeper!

Interested? Register here: https://www.paradime.io/dbt-data-modeling-challenge

submitted by /u/JParkerRogers
[link] [comments]

Have You Experienced Addiction? Do You Have Knowledge Of Your Family History Of Addiction? Share Your Experiences! [Approved Anonymous Survey] (Everyone 18+)

Anonymous Risk-Free Survey Link: https://uky.az1.qualtrics.com/jfe/form/SV_dmB7vD4HQzuRgIC?Q_CHL=qr

As someone in recovery myself, I am pursuing a cognitive neuroscience PhD and I want to discover if there are familial patterns of substance use/addictive behaviors and if there is intergenerational concordance regarding substance/activity preference, age at onset, treatment-seeking, etc.

Please share your experiences to help us improve addiction prevention and intervention methods! Every response, every share, and every tag propels us closer to groundbreaking discoveries. You’re not just filling out an anonymous survey—you’re fueling a recovery revolution!

Remember: Your experience is powerful. Your voice matters. Your participation saves lives.

Thank you so much for your commitment to helping others!

submitted by /u/di6duthfiyd75w
[link] [comments]