Exploring neural networks for text classification

Posted on Fri 17 November 2017 in blog • Tagged with python, machine learning, keras, nlp, deep learning, classification

I've been working on text classification recently. I've found keras to be a quite good high-level language and great for learning different neural network architectures. In this notebook I will examine Tweet classification using CNN and LSTM model architechtures. While CNNs are widely used in Computer Vision, I saw a paper
Continue reading

Gradient descent by matrix multiplication

Posted on Thu 23 February 2017 in blog • Tagged with python, data science, machine learning, math

Deep learning is getting so popular that even Mark Cuban is urging folks to learn it to avoid becoming a "dinosaur". Okay Mark, message heard, I'm addressing this guilt trip now. I originally tried starting in tensorflow (tensors are multidimensional arrays), but I quickly realized that I don't think in terms of tensors/matrices. For example, I drew a blank when thinking about how to take a partial derivative using matrix multiplication. So, as an exercise to understand concepts such as notation and matrix computations, my goal is to implement gradient descent on a multiple regression model.

Continue reading

Topic modeling and visualization of tweets

Posted on Sun 31 January 2016 in blog • Tagged with python, data science, topic modeling, machine learning, twitter

As more people tweet to companies, it is imperative for companies to parse through the many tweets that are coming in, to figure out what people want and to quickly deal with upset customers. Machine learning can help to facilitate this. In this notebook, I'll examine a dataset of ~14,000 tweets directed at various airlines. The algorithm I'm choosing to use is Latent Dirichlet Allocation
Continue reading

What color is your paycheck?

Posted on Fri 04 December 2015 in blog • Tagged with python, data science, visualization, statistics, pca, api, bokeh, stats

One of the data science skills I want to play around with is deriving insights from data that publically available. Here, lets use some data on SF employee compensation and see what we can learn from the data.

First, per usual, load the dependencies.

What's in a name? That which we call a data scientist... (part 2)

Posted on Mon 19 October 2015 in blog • Tagged with python, data science, scraping

What is a data scientist? To answer this, we will scrape "data scientist" job posts from Stack Overflow. In the last post, we looked at how to scrape a single job posting. Here, we will iterate that same script over hundreds of posts. Let's get started.

What's in a name? That which we call a data scientist... (part 1)

Posted on Sat 17 October 2015 in blog • Tagged with python, data science, scraping, text

What is a data scientist? Seems like it means different things to different people. Well, what if we let the companies who need a data scientist tell us?

To do this, let's look at jobs on Stack Overflow. The advantage here is that each posting is along the same html format, as opposed to other sites like Indeed.com or Monster.com, where job descriptions vary by company. To do this, we need a cursory knowledge of html, and a python package that helps us with the heavy lifting.

Continue reading

Principal Component Analysis for a five year old

Posted on Thu 15 October 2015 in blog • Tagged with R, data science, stats, PCA

I went to a talk a couple of weeks ago at Stanford on using machine learning to understand complex biological data. At one point in the talk the speaker made an offhand comment about data so simple "that a five year old could cluster it". Wow, were you that smart at five?

Continue reading

Statistical learning on NBA shot data

Posted on Sun 11 October 2015 in blog • Tagged with python, NBA, api, machine learning, regression, logistic regression, regularization

In the last post, I pulled some NBA shot data for Andrew Wiggins and put that into a dataframe. Here, we will apply some supervised learning techniques from sklearn to build predictive models and then use visualizations to better understand the data.

Some topics we'll explore are prediction error, regularization, and the tradeoff between prediction accuracy and model interpretability.

Continue reading

Scraping NBA shot data using python

Posted on Sat 10 October 2015 in blog • Tagged with python, NBA, api

My goal is to learn how to scrape data using python and do some quick data analysis.

This is my first time scraping from the web. I found this documentation extremely helpful. Here, I'm pulling in the shot log for Andrew Wiggins, the NBA Rookie of the Year for the 2014-2015 season.

Continue reading

Measuring cancer cell dynamics in response to therapy

Posted on Fri 09 October 2015 in blog • Tagged with R, research, systems

Example R code to process highly-multiplexed cancer drug responses. Formatted so that you should be able to run it on a mac.

Continue reading