Shipping Container Delay Prediction

Developed a regression model to predict shipping container ETA delays for CargoProbe, improving on their existing heuristics with a Random Forest.

More About the Project

Python | Scikit-learn | Random Forest | Data Analysis | Data Processing

This project for CargoProbe tackled supply chain disruptions by predicting shipping container delays. The provided data was complex, with over 330k entries, 23 columns, and significant missing values (e.g., 98% missing port data). After extensive data cleaning, feature engineering (creating 'time_left', 'time_spent', 'delay'), and imputation analysis, we compared several regression models. The final Random Forest model significantly outperformed CargoProbe's existing heuristics, reducing the Mean Absolute Error from 3.58 days to 1.20 days and the MSE from 54.75 to 7.92.

Technical Report

Download PDF

My Contributions

Data Cleaning and Preprocessing

Feature Engineering and Data Analysis

Model Development and Evaluation