I’m building a film recommendation system, I have a large csv file with film data scraped from the IMDB dataset which I plan to use to build the machine learning model, at the same time I’m using theMovieDB api to get some extra film details like plot summary.
I’m using around 300,000 films from IMDB, and some records are missing certain data, like editor, cinematographer etc., and I’m not sure how much more data each dataset has on a film compared to the other.
Would it be better to consistently use TMDB api to display film data on the frontend, and only use IMDB to build the ML model, or consistently use the IMDB csv throughout my system for the model and for displaying film details. Alternatively I could cross-reference both sources but I’m wary of contrasting data in both datasets.
Any advice is appreciated
submitted by /u/wobowizard
[link] [comments]