Document Type



Master of Science (MS)



First Advisor's Name

Sneh Gulati

First Advisor's Committee Title

Committee Chair

Second Advisor's Name

Zhenmin Chen

Second Advisor's Committee Title

Committee Member

Third Advisor's Name

Jie Mi

Third Advisor's Committee Title

Committee Member


Sabermetrics, Statistics, Baseball, Runs, Sports, Offense, Defense, Regression, Estimation

Date of Defense



The focus of this thesis was to investigate which baseball metrics are most conducive to run creation and prevention. Stepwise regression and Liu estimation were used to formulate two models for the dependent variables and also used for cross validation. Finally, the predicted values were fed into the Pythagorean Expectation formula to predict a team’s most important goal: winning.

Each model fit strongly and collinearity amongst offensive predictors was considered using variance inflation factors. Hits, walks, and home runs allowed, infield putouts, errors, defense-independent earned run average ratio, defensive efficiency ratio, saves, runners left on base, shutouts, and walks per nine innings were significant defensive predictors. Doubles, home runs, walks, batting average, and runners left on base were significant offensive regressors. Both models produced error rates below 3% for run prediction and together they did an excellent job of estimating a team’s per-season win ratio.





Rights Statement

Rights Statement

In Copyright. URI:
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).