The Big Porn Dataset is the largest and most comprehensive collection of adult content available on the web. With an amount of 23.686.411 Video URLs it exceeds possibly every other Porn Dataset.
I got quite a lot of feedback. I’ve removed unnecessary tags (some I couldn’t include due to the size of the dataset) and added others.
Use Cases
Since many people said my previous dataset was a “useless dataset”, I will include Use Cases for each column.
Website – Analyze what website has the most videos, analyze trends based on the website. URL – Webscrape the URLs to obtain metadata from the models or scrape comments (“https://pornhub.com/comment/show?id={video_id}}&limit=10&popular=1&what=video”). 😉 Title – Train a LLM to generate your own titles. See below. Tags – Analyze the tags based on plattform, which ones appear the most, etc. Upload Date – Analyze preferences based on upload date. Video ID – Useful for webscraping comments, etc.
Large Language Model
I have trained a Large Language Model on all English titles. I won’t publish it, but I’ll show you examples of what you can do with The Big Porn Dataset.
Generated titles:
F…ing My Stepmom While She Talks Dirty Ho.ny Latina Slu..y Girl Wants Ha..core An.l S.x Solo teen p…y play B.g t.t teen gets f….d hard S.xy E..ny Girlfriend
(I censored them because… no.)
Note: This dataset contains sensitive content and is intended solely for research and educational purposes. 😉 Please ensure compliance with all relevant regulations and guidelines when using this data. Use responsibly. 😊
More information on Huggingface and Twitter:
submitted by /u/itsnikity
[link] [comments]