Statistical learning on NBA shot data

Posted on Sun 11 October 2015 in blog • Tagged with python, NBA, api, machine learning, regression, logistic regression, regularization

In the last post, I pulled some NBA shot data for Andrew Wiggins and put that into a dataframe. Here, we will apply some supervised learning techniques from sklearn to build predictive models and then use visualizations to better understand the data.

Some topics we'll explore are prediction error, regularization, and the tradeoff between prediction accuracy and model interpretability.


Continue reading

Using python to get an intuition for multiple regression

Posted on Fri 02 October 2015 in blog • Tagged with python, data science, regression, stats

I want to get some intuition about regression models using multiple independent variables. More precisely, I am unsure if the relevant predictors would be better uncovered by multiple regression, or by pairwise analysis of all predictors against the response variable. So I'd like to use a dataset where I know the precise contribution of each predictor to the response variable.


Continue reading

Variable selection for multiple regression models

Posted on Tue 08 September 2015 in blog • Tagged with R, jupyter, regression, stats

Here, I want to look at using R to perform variable selection for a linear model. Let's consider forward and reverse selection, statistical techniques to keep only variables that maximize the variance explained. The dataset I'm using is the Boston housing price dataset from the MASS library.

Note, that there are some drawbacks/limitations to consider when using variable selection: http://www.stata.com/support/faqs/statistics/stepwise-regression-problems/
Continue reading