I’ve been working on an API that pulls web pages for a given topic, crawls them, and returns a structured research dataset.
You get the synthesized summary, the source excerpts it pulled from, and the crawl logs.
Basically a small pipeline that turns a topic into a verifiable mini dataset you can reuse or analyze.
I’m sharing it here because a few people told me the output is more useful than the “AI search” tools that hide their sources.
If anyone here works with web-derived datasets, I’d like honest feedback on the structure, fields, or anything that’s missing.
submitted by /u/Affectionate-Olive80
[link] [comments]