Gradient descent by matrix multiplication
Posted on Thu 23 February 2017 in blog • Tagged with python, data science, machine learning, math
Deep learning is getting so popular that even Mark Cuban is urging folks to learn it to avoid becoming a "dinosaur". Okay Mark, message heard, I'm addressing this guilt trip now. I originally tried starting in tensorflow (tensors are multidimensional arrays), but I quickly realized that I don't think in terms of tensors/matrices. For example, I drew a blank when thinking about how to take a partial derivative using matrix multiplication. So, as an exercise to understand concepts such as notation and matrix computations, my goal is to implement gradient descent on a multiple regression model.
Continue reading
Topic modeling and visualization of tweets
Posted on Sun 31 January 2016 in blog • Tagged with python, data science, topic modeling, machine learning, twitter
As more people tweet to companies, it is imperative for companies to parse through the many tweets that are coming in, to figure out what people want and to quickly deal with upset customers. Machine learning can help to facilitate this. In this notebook, I'll examine a dataset of ~14,000 tweets directed at various airlines. The algorithm I'm choosing to use is Latent Dirichlet Allocation
Continue reading
What color is your paycheck?
Posted on Fri 04 December 2015 in blog • Tagged with python, data science, visualization, statistics, pca, api, bokeh, stats
One of the data science skills I want to play around with is deriving insights from data that publically available. Here, lets use some data on SF employee compensation and see what we can learn from the data.
First, per usual, load the dependencies.
What's in a name? That which we call a data scientist... (part 2)
Posted on Mon 19 October 2015 in blog • Tagged with python, data science, scraping
What is a data scientist? To answer this, we will scrape "data scientist" job posts from Stack Overflow. In the last post, we looked at how to scrape a single job posting. Here, we will iterate that same script over hundreds of posts. Let's get started.
What's in a name? That which we call a data scientist... (part 1)
Posted on Sat 17 October 2015 in blog • Tagged with python, data science, scraping, text
What is a data scientist? Seems like it means different things to different people. Well, what if we let the companies who need a data scientist tell us?
To do this, let's look at jobs on Stack Overflow. The advantage here is that each posting is along the same html format, as opposed to other sites like Indeed.com or Monster.com, where job descriptions vary by company. To do this, we need a cursory knowledge of html, and a python package that helps us with the heavy lifting.
Continue reading
Principal Component Analysis for a five year old
Posted on Thu 15 October 2015 in blog • Tagged with R, data science, stats, PCA
I went to a talk a couple of weeks ago at Stanford on using machine learning to understand complex biological data. At one point in the talk the speaker made an offhand comment about data so simple "that a five year old could cluster it". Wow, were you that smart at five?
Continue reading
Optimizing k in k-means clustering
Posted on Tue 06 October 2015 in blog • Tagged with R, data science, clustering
I want to get my hands dirty with clustering after seeing a great lecture at Stanford. Here I'm looking at k-means clustering, an algorithm to identify groups in multidimensional data.
I'm using a builtin dataset in R, "ruspini" Also, I found this site as a helpful template to start.
Using python to get an intuition for multiple regression
Posted on Fri 02 October 2015 in blog • Tagged with python, data science, regression, stats
I want to get some intuition about regression models using multiple independent variables. More precisely, I am unsure if the relevant predictors would be better uncovered by multiple regression, or by pairwise analysis of all predictors against the response variable. So I'd like to use a dataset where I know the precise contribution of each predictor to the response variable.
Continue reading
MyFIRST MySQL
Posted on Thu 01 October 2015 in blog • Tagged with MySQL, data science
Intro to SQL. I've looked at a couple tutorials on SQL, but the best way to learn is to play around right? Let's get started.
Continue reading
Plotting and error anlysis illustrating R notebooks (jupyter)
Posted on Wed 30 September 2015 in blog • Tagged with R, jupyter, data science
The goal of this exercise is to try R in jupyter and compare simple model fits. I spent way too much much time trying to get plots embedded using RStudio. For now, I just want a happy, functional black box. So luckily, I've been using ipython notebook (now jupyter) and there's now R functionaliy. Time to try it out!
Continue reading