I Built A Free Tool That Auto-generates Scrapers For Any Website With AI

I got frustrated with the time and effort required to code and maintain custom web scrapers for collecting data, so me and my friends built an LLM-based solution for data extraction from websites. AI should automate tedious and un-creative work, and web scraping definitely fits this description.

Try it out for free on our playground https://kadoa.com/playground and let me know what you think!

We’re leveraging LLMs to understand the website structure and generate the DOM selectors for it. Using LLMs for every data extraction, as most comparable tools do, would be way too expensive and very slow, but using LLMs to generate the scraper code and subsequently adapt it to website modifications is highly efficient and maintenance-free.

How it works (the playground uses a simplified version of this):

Loading the website: automatically decide what kind of proxy and browser we need Analyzing network calls: Try to find the desired data in the network calls Preprocessing the DOM: remove all unnecessary elements, compress it into a structure that GPT can understand Selector generation: Use an LLM to find the desired information with the corresponding selectors Data extraction in the desired format Validation: Hallucination checks and verification that the data is actually on the website and in the right format Data transformation: Clean and map the data (e.g. if we need to aggregate data from multiple sources into the same format). LLMs are great at this task too

The vision is fully autonomous and maintenance-free data processing from sources like websites or PDFs, basically “prompt-to-data” 🙂 It’s far from perfect yet, but we’ll get there.

submitted by /u/madredditscientist
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *