
IMDB Rating Prediction
A machine learning project that predicts IMDB movie ratings using regression models trained on a rich feature set including genre, director, cast, budget, and release metadata. Explores feature engineering, model comparison, and evaluation metrics to find the best predictor of audience reception.
Tech Stack
The Problem
Predicting how audiences will rate a movie before release is valuable for studios, distributors, and recommendation systems, but requires extracting meaningful signals from heterogeneous metadata.
Approach
Performed extensive EDA and feature engineering on IMDB datasets. Trained and compared multiple regression models (Linear Regression, Random Forest, Gradient Boosting) with cross-validation. Evaluated using RMSE, MAE, and R² metrics.
Key Challenges
- Handling high-cardinality categorical features like cast and director names
- Dealing with missing data across multiple columns
- Avoiding data leakage from features that correlate with release timing
Results
Built a predictive pipeline that identifies the most influential factors in movie ratings and achieves competitive prediction accuracy across test sets.