Principal Component Analysis for a five year old

Posted on Thu 15 October 2015 in blog • Tagged with R, data science, stats, PCA

I went to a talk a couple of weeks ago at Stanford on using machine learning to understand complex biological data. At one point in the talk the speaker made an offhand comment about data so simple "that a five year old could cluster it". Wow, were you that smart at five?


Continue reading

Measuring cancer cell dynamics in response to therapy

Posted on Fri 09 October 2015 in blog • Tagged with R, research, systems

Example R code to process highly-multiplexed cancer drug responses. Formatted so that you should be able to run it on a mac.


Continue reading

Optimizing k in k-means clustering

Posted on Tue 06 October 2015 in blog • Tagged with R, data science, clustering

I want to get my hands dirty with clustering after seeing a great lecture at Stanford. Here I'm looking at k-means clustering, an algorithm to identify groups in multidimensional data.

I'm using a builtin dataset in R, "ruspini" Also, I found this site as a helpful template to start.

Plotting and error anlysis illustrating R notebooks (jupyter)

Posted on Wed 30 September 2015 in blog • Tagged with R, jupyter, data science

The goal of this exercise is to try R in jupyter and compare simple model fits. I spent way too much much time trying to get plots embedded using RStudio. For now, I just want a happy, functional black box. So luckily, I've been using ipython notebook (now jupyter) and there's now R functionaliy. Time to try it out!


Continue reading

Variable selection for multiple regression models

Posted on Tue 08 September 2015 in blog • Tagged with R, jupyter, regression, stats

Here, I want to look at using R to perform variable selection for a linear model. Let's consider forward and reverse selection, statistical techniques to keep only variables that maximize the variance explained. The dataset I'm using is the Boston housing price dataset from the MASS library.

Note, that there are some drawbacks/limitations to consider when using variable selection: http://www.stata.com/support/faqs/statistics/stepwise-regression-problems/
Continue reading