[self-promotion] CRED-1: Open Dataset Of 2,672 Domains Scored For Credibility (CC BY 4.0, Zenodo DOI)

We just released CRED-1, an open dataset scoring 2,672 domains for credibility. It combines two established media watchdog sources (OpenSources.co and Iffy.news) and enriches them with four automated signals:

  • Tranco web rank (popularity/reach)
  • RDAP domain age
  • Google Fact Check Tools API (claim counts)
  • Google Safe Browsing API (malware/phishing flags)

Each domain gets a composite credibility score (0-1) based on a weighted model. The dataset is available as both a compact JSON and a full CSV with all enrichment fields.

Use cases: misinformation research, browser extensions, content moderation, media literacy tools, training data for credibility classifiers.

Key stats: – 2,672 domains across 5 categories (fake, unreliable, conspiracy, satire, other) – 704 matched in Tranco Top 1M – 67 domains with Google Fact Check claims – Score range: 0.000 to 0.962

License: CC BY 4.0 DOI: 10.5281/zenodo.18769460 GitHub: https://github.com/aloth/cred-1

Paper submitted to Data in Brief (Elsevier) and available on arXiv.

Happy to answer questions about the methodology or scoring model.

submitted by /u/bit3py
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *