Using tensorflow multilayer perceptron to predict housing prices

Posted on Thu 29 March 2018 in blog

I've used keras a bit for deep learning recently. In general it is an excellent tool, but I know there are limitations in using a higher-level tool. In trying to implement an initial model in tensorflow, I found that you have to think about fundamentals of data flow and building computation graphs -- as opposed to machine learning a la sklearn.

Here I compare a simple linear model with a tensorflow multilayer perceptron implementation. I'm using the canonical Boston house price prediction set (built into sklearn).

Skills used:

  • use tf.data.Dataset for both model training and inference
  • define neural netword build computation graphs
  • perform training/testing using tf.Session()
  • benchmark several MLP models against a linear model

First, load the data and do a quick linear model

In [1]:
import tensorflow as tf
from tensorflow.contrib.layers import repeat
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

import matplotlib.pylab as plt
import seaborn as sns

plt.style.use('/Users/pf494t/.matplotlib/stylelib/plf.mplstyle')

%matplotlib inline
/Users/pf494t/anaconda2/envs/py36/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

check the data shape

506 examples, 13 features

In [2]:
boston = load_boston()
print(boston.data.shape)
(506, 13)

format the data

In [3]:
mean_ctr = boston['data'] - np.mean(boston['data'],axis=0)
normed_features = mean_ctr/np.std(mean_ctr,axis=0)
normed_labels = boston['target'].reshape(-1,1)

split to training/test set

In [4]:
train_features, test_features, train_labels, test_labels = train_test_split(
                                                                            normed_features,
                                                                            normed_labels,
                                                                            test_size=0.2,
                                                                            random_state=42
                                                                           )

train_data = (train_features,train_labels)
test_data = (test_features, test_labels)

linear model baseline

In [5]:
lm = LinearRegression()
lm.fit(normed_features,normed_labels)
lm_preds = lm.predict(test_features)
In [6]:
lm_mse = mean_squared_error(test_labels,lm_preds)

print('linear model MSE: {0:.2f}'.format(lm_mse))
linear model MSE: 21.82

model data using a multilayer perceptron

first, set training parameters

In [7]:
num_epochs = 500
batch_size = 32
test_n = test_data[0].shape[0]

Define data flow using dataset and iterator

  • use tf.data.Dataset to iterate over the data
  • use a custom iterator for flexibility across datasets, e.g., training– and test test
    • use .repeat() to continually iterate over training examples
    • use .batch() to specift the batch size
    • make an initializer for both training and test set
In [8]:
tf.reset_default_graph() # avoid errors if re-running the cell

# format numpy data for network
train_data = (train_features,train_labels) 
test_data = (test_features, test_labels)

# consume the numpy data in batches
train_dataset = tf.data.Dataset.from_tensor_slices(train_data).repeat().batch(batch_size)
test_dataset = tf.data.Dataset.from_tensor_slices(test_data).repeat().batch(test_n)

# instantiate iterator with the correct shape and type
iterator = tf.data.Iterator.from_structure(train_dataset.output_types,
                                           train_dataset.output_shapes)

# feed these objects into network
features, labels = iterator.get_next()

# initialize to desired datasets
train_init_op = iterator.make_initializer(train_dataset,name='training_iterator')
test_init_op = iterator.make_initializer(test_dataset,name='testing_iterator')

Build graph

architecture
  • 1 fully connected hidden layer, with a ReLU activation
  • output layer, which directly predicts output
  • use mean squared error, optimized with Adam (works well off the shelf)
In [9]:
hidden_1 = tf.layers.dense(features,48,
                           activation = tf.nn.relu,name='hidden1')

prediction = tf.layers.dense(hidden_1,1,name='prediction')
loss = tf.losses.mean_squared_error(labels = labels,
                                    predictions = prediction)

optimizer = tf.train.AdamOptimizer().minimize(loss)

saver = tf.train.Saver()

Run and save the model

In [10]:
num_epochs = 500
batch_size = 32
test_n = test_data[0].shape[0]

in an open session, initialize everything, learn on the training data, and test on the holdout set

In [11]:
losses = []

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer()) # initialize everything
    sess.run(train_init_op) # use training dataset
    
    for _ in range(num_epochs):
        l, _ = sess.run([loss,optimizer])
        losses.append(l)
    sess.run(test_init_op) # switch to test dataset
    test_loss = sess.run([loss])[0]
    print('test loss: {0:.2f}'.format(test_loss))
    saver.save(sess,'./hello_boston/initializable.ckpt')    
test loss: 32.38

Test several types of graphs

  • write a function to try different numbers of hidden layers

With more time I would break this into several functions...

In [12]:
def train_model(num_epochs = 500, batch_size = 32, num_nodes = 96, num_hidden_layers = 2,
                train_data = (train_features,train_labels),
                test_data = (test_features, test_labels),   
               ):
    
    tf.reset_default_graph() # avoid errors if re-running the cell

    test_n = test_data[0].shape[0]
    train_dataset = tf.data.Dataset.from_tensor_slices(train_data).repeat().batch(batch_size)
    test_dataset = tf.data.Dataset.from_tensor_slices(test_data).repeat().batch(test_n)

    # create a iterator of the correct shape and type
    iterator = tf.data.Iterator.from_structure(train_dataset.output_types,
                                               train_dataset.output_shapes)

    features, labels = iterator.get_next()

    # initialization operations
    train_init_op = iterator.make_initializer(train_dataset,name='training_iterator')
    test_init_op = iterator.make_initializer(test_dataset,name='testing_iterator')

    features, labels = iterator.get_next()
    
    if num_hidden_layers == 0:
        prediction = tf.layers.dense(features,1,name='prediction')
    elif num_hidden_layers == 1:
        hidden_1 = tf.layers.dense(features,num_nodes,activation = tf.nn.relu,name='hidden1')
        prediction = tf.layers.dense(hidden_1,1,name='prediction')
    elif num_hidden_layers == 2:
        hidden_1 = tf.layers.dense(features,num_nodes,activation = tf.nn.relu,name='hidden1')
        hidden_2 = tf.layers.dense(hidden_1,num_nodes,activation = tf.nn.relu,name='hidden2')
        prediction = tf.layers.dense(hidden_2,1,name='prediction')
    else:
        hidden_1 = tf.layers.dense(features,num_nodes,activation = tf.nn.relu,name='hidden1')
        hidden_2 = tf.layers.dense(hidden_1,num_nodes,activation = tf.nn.relu,name='hidden2')
        hidden_3 = tf.layers.dense(hidden_2,num_nodes,activation = tf.nn.relu,name='hidden3')
        prediction = tf.layers.dense(hidden_3,1,name='prediction')
    
    loss = tf.losses.mean_squared_error(labels = labels, predictions = prediction)
    optimizer = tf.train.AdamOptimizer(beta1 = 0.6).minimize(loss)

    losses = []

    # create the initialisation operations
    train_init_op = iterator.make_initializer(train_dataset,name='training_iterator')
    test_init_op = iterator.make_initializer(test_dataset,name='testing_iterator')
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer()) # initialize everything
        sess.run(train_init_op) # use training dataset

        #writer = tf.train.SummaryWriter('./hello_boston', graph=tf.get_default_graph())
        for _ in range(num_epochs):
            l, _ = sess.run([loss,optimizer])
            losses.append(l)
        sess.run(test_init_op) # switch to test dataset
        mse = sess.run([loss])
    return losses, mse[0]
In [13]:
losses_1L_48N, mse_1L_48N = train_model(num_hidden_layers=1,num_nodes=48,num_epochs=500)
losses_2L_48N, mse_2L_48N = train_model(num_hidden_layers=2,num_nodes=48,num_epochs=500)
losses_3L_48N, mse_3L_48N = train_model(num_hidden_layers=3,num_nodes=48,num_epochs=500)
losses_3L_96N, mse_3L_96N = train_model(num_hidden_layers=3,num_nodes=96,num_epochs=500)
In [14]:
fig, ax = plt.subplots()
ax.plot(losses_1L_48N,alpha=0.5,label='1L_48N')
ax.plot(losses_2L_48N,alpha=0.5,label='2L_48N')
ax.plot(losses_3L_48N,alpha=0.5,label='3L_48N')
ax.plot(losses_3L_96N,alpha=0.5,label='3L_96N')
ax.set_ylim([0,900])
ax.set_ylabel('log(MSE)')
ax.set_xlabel('epochs')

plt.legend(loc=0,ncol=1,fontsize=12)

plt.show()
In [15]:
fig, ax = plt.subplots()

ax.bar(range(4),[mse_1L_48N,mse_2L_48N,mse_3L_48N,mse_3L_96N])
ax.hlines(lm_mse,-0.5,3.5,linestyles='dashed')
ax.set_xticks([0,1,2,3])
ax.set_xticklabels(['1L_48N','2L_48N','3L_48N','3L_96N'])
ax.text(2.5,25,'linear model',fontsize=12)
ax.set_ylabel('MSE')
plt.show()

Conclusions

  • Tensorflow can effectively learn from numpy arrays, when data flow is specified using tf.data.Dataset
  • (Unsurprisingly) deeper networks learn faster and achieve better results
  • Out-of-the-box MLP performs better at predicting house prices than linear models

Appendix

using one-shot iterator instead

simpler, but less flexible. Trains on one dataset, can't point to test set, for example

In [16]:
batch_size = 64
num_epochs = 500

dataset = tf.data.Dataset\
            .from_tensor_slices((normed_features, normed_labels))\
            .repeat()\
            .batch(batch_size)
            
iterator = dataset.make_one_shot_iterator()
data_slice = iterator.get_next()

x, y = data_slice
hidden_1 = tf.layers.dense(x,10,activation = tf.nn.relu)
hidden_2 = tf.layers.dense(hidden_1,10,activation = tf.nn.relu)
prediction = tf.layers.dense(hidden_2,1)
loss = tf.losses.mean_squared_error(labels = y, predictions = prediction)
optimizer = tf.train.AdamOptimizer().minimize(loss)

saver = tf.train.Saver()

losses = []
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for _ in range(num_epochs):
        l, _ = sess.run([loss,optimizer])
        losses.append(l)
    saver.save(sess,'./hello_boston/first_try.ckpt')
    print('done!')
done!