Revolutionizing Baseball Strategy: Validating Moneyball Predictions through Machine Learning Models

After diving into regression analysis, I couldn't wait to test my newfound skills on some real-world data. Luckily, Kaggle has just the thing - they're hosting a competition called 'History of Baseball' and, even better, they've provided a dataset for it!

I had a blast analyzing Paul dePodesta's predictions and statistical findings, using linear regression to put them under the microscope. For a full breakdown of my analysis, hit this link: Moneyball Predictions Kaggle Script

But if you just want the juicy bits, here's a quick summary:

--> Lo and behold, there's a linear relationship between the number of wins and run difference!

Wins vs Runs Difference
Wins vs Runs Difference

--> W = the magic number of wins the Oakland Athletics needed to qualify for the playoffs.
Paul dePodesta guessed 95, our model said 93, and the actual cutoff? A mere 92 wins.

--> R = Runs scored by the Oakland Athletics in 2002.
We predicted a solid 800 runs, and would you believe it? They actually scored 800 runs!

--> RA = Runs allowed by the Oakland Athletics in 2002.
We estimated 671 runs, and they only let in 654. Not too shabby, if I do say so myself.

OBP SLG importance over BA
OBP SLG importance over BA

--> We backed up Paul dePodesta's hunch that OBP and SLG are the real MVPs when it comes to baseball statistics.

Oaklands Performance during 2002-2015
Oaklands Performance during 2002-2015