Each dataset includes
- What technologies were detected (e.g. WordPress 4.5.3)
- The domain it was found on
- The page it was found on
- The IP address associated with the page
- Who owns the IP address
- The geolocation for that IP address
- The URLs found on the page
- The meta description tags for that page
- The size of the HTTP response
- What protocol was used to fulfill the HTTP request
- The date the page was crawled
September 2025: https://www.dropbox.com/scl/fi/0zsph3y6xnfgcibizjos1/sept_2025_jumbo_sample.zip?rlkey=ozmekjx1klshfp8r1y66xdtvx&e=2&st=izkt62t6&dl=0
You can find the full version of the October 2025 dataset here: https://versiondb.io
I hope you guys like it.
submitted by /u/Upper-Character-6743
[link] [comments]