Open-MalSec V0.1 – Open-Source Cybersecurity / Analysis Samples

Evening! 🫑

Just uploaded Open-MalSec v0.1, an early-stage open-source cybersecurity dataset focused on phishing, scams, and malware-related text samples.

πŸ“‚ This is the base version (v0.1)β€”just a few structured sample files. Full dataset builds will come over the next few weeks.

πŸ”— Dataset link: huggingface.co/datasets/tegridydev/open-malsec

πŸ” What’s in v0.1?

A few structured scam examples (text-based)
Covers DeFi, crypto, phishing, and social engineering
Initial labelling format for scam classification

⚠️ This is not a full dataset yet. Just establishing the structure + getting feedback.

πŸ“‚ Current Schema & Labelling Approach

Each entry follows a structured JSON format with:

“instruction” β†’ Task prompt (e.g., “Evaluate this message for scams”)
“input” β†’ Source & message details (e.g., Telegram post, Tweet)
“output” β†’ Scam classification & risk indicators

Sample Entry

json { “instruction”: “Analyze this tweet about a new dog-themed crypto token. Determine scam indicators if any.”, “input”: { “source”: “Twitter”, “handle”: “@DogLoverCrypto”, “tweet_content”: “DOGGIEINU just launched! Invest now for instant 500% gains. Dev is ex-Binance staff. #memecrypto #moonshot” }, “output”: { “classification”: “malicious”, “description”: “Tweet claims insider connections and extreme gains for a newly launched dog-themed token.”, “indicators”: [ “Overblown profit claims (500% ‘instant’)”, “False or unverifiable dev background”, “Hype-based marketing with no substance”, “No legitimate documentation or audit link” ] } }

πŸ—‚οΈ Current v0.1 Sample Categories

Crypto Scams β†’ Meme token pump & dumps, fake DeFi projects

Phishing β†’ Suspicious finance/social media messages

Social Engineering β†’ Manipulative messages exploiting trust

πŸ”œ Next Steps

πŸ” Planned Updates:

Expanding dataset with more phishing & malware examples

Refining schema & annotation quality

Open to feedback, contributions, and suggestions

If this is useful, bookmark/follow the dataset here:

πŸ”— huggingface.co/datasets/tegridydev/open-malsec

More updates coming as I expand the datasets 🫑

πŸ’¬ Thoughts, feedback, and ideas are always welcome! Drop a comment or DMs are open πŸ€™

submitted by /u/tegridyblues
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *