hello, i think it was around february 2020 someone uploaded an amazing IMDb dataset titled “IMDb movies extensive dataset”, i still have the archive file, but i wanted to find a more recent one, i tried making it myself but IMDb doesn’t provide their complete data for free, you can get the basic info but what’s really interesting for me was the breakdown data on ratings, here’s the columns from the “IMDB ratings.csv” file
imdb_title_id,weighted_average_vote,total_votes,mean_vote,median_vote,votes_10,votes_9,votes_8,votes_7,votes_6,votes_5,votes_4,votes_3,votes_2,votes_1,allgenders_0age_avg_vote,allgenders_0age_votes,allgenders_18age_avg_vote,allgenders_18age_votes,allgenders_30age_avg_vote,allgenders_30age_votes,allgenders_45age_avg_vote,allgenders_45age_votes,males_allages_avg_vote,males_allages_votes,males_0age_avg_vote,males_0age_votes,males_18age_avg_vote,males_18age_votes,males_30age_avg_vote,males_30age_votes,males_45age_avg_vote,males_45age_votes,females_allages_avg_vote,females_allages_votes,females_0age_avg_vote,females_0age_votes,females_18age_avg_vote,females_18age_votes,females_30age_avg_vote,females_30age_votes,females_45age_avg_vote,females_45age_votes,top1000_voters_rating,top1000_voters_votes,us_voters_rating,us_voters_votes,non_us_voters_rating,non_us_voters_votes
as you can see it has some juicy information, such as breakdown by age, gender, and most importantly for me the top1000_voters which i think an extremly underrated data point that i rarely mentioned, it’s very useful when you want to determine if the rating of a movie is unbiased or not, i have noticed that a lot of highly rated turkish and indian movies especially have very biased ratings and using the top1000_voters you can find which ones,
also i was able to find interesting things such as which movies females prefer more than males and which genres as well (males are biased more towards westerns while females are biased more towards the family genre)
so my question is; is it possible to get this info from imdb without paying? i live in a third world country and got no credit card to my name, i love to do these types of exploratory analysis as a hobby, can’t pay imdb the thousands that they are asking for and for the life of my i can’t figure out how to webscrape the data with imdb’s anti-scraping systems.
also on a side note it appears they have removed the breakdown in rating details from their website, you can only see breakdown by how many people voted on each score, but not by genders, age or even the top1000 that was there before.
submitted by /u/NoHetro
[link] [comments]