Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

I Have Real Estate Data For Sale – Buyers And Sellers Data 200k+ Lines Including Names And Telephone/ Email

Hi there people, I’m wondering if anyone can point me in the right direction as to where I can sell data I have obtained.

I have comprehensive set of data for sale but dont know where to sell it ,

I have data relating to the purchase and sale of real estate in Dubai. Buyer and seller data base including names of buyers and sellers and all details needed to prospect leads.

The data contains area, property name/building name, seller/buyers name, unit number, sub region, listing price/purchase price, date of purchase / sale, seller/buyers contact number, sellers/buyers email address, seller/buyers id details.

Data available- total lines of data 200k+ available in excel or Google sheet format-

ALL DUBAI MARINA

JVC

JVT

BUSINESS BAY

PALM JUMEIRAH

All DAMAC

SPRINGS

ALL MAJOR APARTMENTS/VILLA COMPLEX-INCUDING SIGNITURE VILLAS

Upto date as of July 2023

Regards

submitted by /u/naughtynatasha93
[link] [comments]

Large Retail Or Manufacturing Datasets

Does anyone here know of any large datasets containing mostly transactional retail or manufacturing data? Preferably multiple tables that are related to each other by primary and foreign keys.

I’m assuming there must be some companies that sell this data to market research companies that we could buy it from if there’s nothing out there for free?

submitted by /u/khaili109
[link] [comments]

Can I Do An Analysis For You For Free?

Does anyone have data they would like a Power BI report made with for free? One stipulation… I want to make a tutorial during the process so you shouldn’t mind the data being shown. Would like the dataset to come with “The question I am trying to answer or insight to gain from all this is…”

submitted by /u/Bombdigitdy
[link] [comments]

Global Dataset For Air Quality Index And Pollutant By Country (and City/state If Possible) Over The Years

Hi! I’m trying to look for a dataset for my university assignment.

I’d prefer if the dataset contains different pollutants such as PM2.5, PM10, O3, NO2, SO2, CO etc. The ones I found are usually either pollutants or AQI only, and in different formats so I can’t combine them easily.

(Optional) Would also be great if the dataset includes contextual data like Temperature, Wind Speed, Humidity, Source of Pollution etc

This would be a great help, thank you so much!!

submitted by /u/jyvenyu
[link] [comments]

How To Create An Image Dataset For Indian Railways Signals?

Hi everyone, I am working on a project that involves machine learning and computer vision. I want to train a model that can recognize and classify different types of signals used by the Indian railways. For this, I need a large and diverse image dataset of railway signals from various locations, angles, lighting conditions, etc.
I have searched online for existing datasets, but I could not find any that suit my needs. So I wish to create my own dataset from scratch. However, I am not sure how to go about it. What are the best practices and tools for creating an image dataset? How do I collect, label, and organize the images? How do I ensure the quality and consistency of the data?

submitted by /u/Responsible-Diver226
[link] [comments]

PubMed Papers & Annotated MESH Terms Dataset?

I’m interested in working on Pubmed/NIH data. I am looking for a dataset of all Medical Subject Headings (MeSH) terms over all pubmed articles (or at least the past few decades of indexed citations), i.e the associated MeSH terms for each article on pubmed, over all the available articles, at the level of individual articles. Is this available? (Preferably, without needing to download and to write parsing code for the full pubmed DB XML dump – which is huge and complex to parse, and using the API per article or term would take forever and be incredibly ineffeccient).

The ideal would be a CSV file or DB dump with with the associated terms, article Id and publication date. Large scale coverage is crucial.

Bonus points if it includes other structured ontology sources per paper, e.g. the associated GO terms.

Thanks very much!

submitted by /u/ddofer
[link] [comments]

Exploring Opportunities: How To Utilize A 25 Million-Product E-commerce Dataset For Tools And Dashboards?

As a back-end developer, I’ve scraped a dataset of 25 million products from the largest e-commerce websites in the Middle East with no duplicate products. This dataset includes basic information of each product, price history, descriptions, specifications, image links, category and breadcrumbs, recommended products, and more for each product. How can I leverage this data, and what tools and dashboards can I develop and potentially offer to other e-commerce websites?

submitted by /u/HajiIman
[link] [comments]

[Request] Big Dataset Of Fiction With Titles?

I’m looking for a dataset of short stories or novellas full texts with their titles (clearly delimited and everything in English) to train a model for title generation by abstractive summarization. The bigger the better.

Preferably erotica, thriller or drama but everything that isn’t sci-fi would work. Any ideas of where could I find that?

submitted by /u/SCP_radiantpoison
[link] [comments]

[self-promotion] Company Index Mapped To Public Identifiers (CIKs, LEIs, EINs) And Identifiers From Market Data Providers (PermID, OpenFIGI)

Cybersyn is building a Company Index (“security master” for finance nerds) to support joining companies, subsidiaries, and their brands together in a hierarchy. This is a persistent problem across companies and a major missing join key.

Our recent SEC Filings release on Snowflake Marketplace marks a first, small, step towards building a reference spine which we refer to as our Company Index. We map our Company Index to public identifiers (e.g. CIKs, LEIs, EINs) and identifiers from market data providers (PermID, OpenFIGI).

To start, we’re working with public companies but this will soon extend.

submitted by /u/aiatco2
[link] [comments]

Seeking Dataset: NAICS Codes Vs. Business Descriptions

I’m in search of a dataset that pairs NAICS codes with business descriptions, but not the standard generic descriptions. I’m interested in how businesses describe themselves in relation to NAICS codes. Ideally, I’d like around 500 descriptions for each NAICS code. I’ve scoured various sources without success. Does anyone know where I can find such a dataset? Any leads or suggestions would be greatly appreciated!

submitted by /u/coder903
[link] [comments]

I Have A Massive Dataset Of Flirting / Dating-app Messages. What To Do?

Without going into specifics, my company has legally, internally (through our app) acquired a massive dataset of millions of flirting-related conversations through dating apps / Instagram DMs / text messages.

How much do you think these transcripts are worth? What interesting projects / AI models could I train with this data? Let me know if you have any other recommendations about what to do with this dataset!

***not interested in any nefarious, illegal, or immoral recommendations***

Thanks!

submitted by /u/Blake_CS_Fit
[link] [comments]

I Built A Free Tool That Auto-generates Scrapers For Any Website With AI

I got frustrated with the time and effort required to code and maintain custom web scrapers for collecting data, so me and my friends built an LLM-based solution for data extraction from websites. AI should automate tedious and un-creative work, and web scraping definitely fits this description.

Try it out for free on our playground https://kadoa.com/playground and let me know what you think!

We’re leveraging LLMs to understand the website structure and generate the DOM selectors for it. Using LLMs for every data extraction, as most comparable tools do, would be way too expensive and very slow, but using LLMs to generate the scraper code and subsequently adapt it to website modifications is highly efficient and maintenance-free.

How it works (the playground uses a simplified version of this):

Loading the website: automatically decide what kind of proxy and browser we need Analyzing network calls: Try to find the desired data in the network calls Preprocessing the DOM: remove all unnecessary elements, compress it into a structure that GPT can understand Selector generation: Use an LLM to find the desired information with the corresponding selectors Data extraction in the desired format Validation: Hallucination checks and verification that the data is actually on the website and in the right format Data transformation: Clean and map the data (e.g. if we need to aggregate data from multiple sources into the same format). LLMs are great at this task too

The vision is fully autonomous and maintenance-free data processing from sources like websites or PDFs, basically “prompt-to-data” 🙂 It’s far from perfect yet, but we’ll get there.

submitted by /u/madredditscientist
[link] [comments]

Spanish LaLiga And Premier League Historical Dataset

Is anyone aware of places that have a complete dataset of matches, players, and their relative actions in said matches like, goal kicks, kicks that went into a goal, how many yellows, red cards, etc.

It can be websites where the data is readily available, APIs or blogs, I would prefer La Liga more than Premier League.

I’ve been searching around but could only reliably find sofascore and marca as sources of information.

Thanks!

submitted by /u/Technopulse
[link] [comments]