House price prediction 1/4: Using Keras/Tensorflow and python

A series about creating a model using Python and Tensorflow and then importing the model and making predictions using Javascript in a Vue.js application, above is the vid and below you will find some useful notes.

In this post I am going to talk about how to create a model in python, pre-process a dataset I've already created, train a model, post-process, predict, and finally create different files for sharing some information about the data for later use.

Then in part 2 called House price prediction 2/4: Using Tensorflow.js, Vue.js and Javascript I will take the model, the data for pre and post processing and finally predict using Vue.js.

Then in part 3 I will show how does one hot encoding works.

And finally in part 4 normalizing the inputs and its importance.

If you want to see a simpler model and how it integrates with a javascript application using Tensorflow.js and Vue.js you can check my previous post: How to import a Keras model into a Vue.js application using Tensorflow.Js, where I also show how to publish the web site into Github Pages.

1.
Pre-reqs
- Have Python 3.x installed
  
  Have Tensorflow installed
  
  Have Anaconda Installed (Optional)

The Dataset

The data was organized in a csv file with the price as the output and size, rooms, baths, parking and neighborhood as the inputs

price,size,rooms,baths,parking,neighborhood
270000000,180.0,5,2.0,0,medellin aranjuez
280000000,168.0,5,3.0,0,medellin centro
350000000,95.0,3,2.0,0,medellin belen rosales
350000000,103.0,3,3.0,1,medellin la castellana
310000000,95.0,3,2.0,0,medellin la castellana
...

I defined a couple of variables to hold the column names for the inputs, outputs and also for the categorical column

  
X_colum_names = ['size', 'rooms', 'baths', 'parking', 'neighborhood']
Y_colum_names = ['price']
categorical_column = 'neighborhood'

Loaded the Dataset using Pandas

  
CSV_PATH = "./dataset/dataset.csv"

df = pd.read_csv(CSV_PATH, index_col=False)

print(df.head())
print(df.columns)

Split the features/columns between inputs (X) and outputs (Y)

  
X = df[common.X_colum_names]
Y = df[common.Y_colum_names]

print(X.head(), Y.head())

Created the one hot encoder, scalers and split the dataset into groups

  
#%% Configure categorical columns
label_encoder, onehot_encoder = common_categorical.create_categorical_feature_encoder(X[common.categorical_column])

#%% Scale data
# Create scaler so that the data is in the same range
x_scaler, y_scaler = common_scaler.create_scaler(X.values[:,0:4], Y.values)

#%% Split the dataset into different groups
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=42)

Then the utilities for pre and post processing the inputs are used

  
arr_x_train, arr_y_train = common_pre_post_processing.transform_inputs(X_train,
                                                   label_encoder,
                                                   onehot_encoder,
                                                   x_scaler,
                                                   y_train,
                                                   y_scaler)

arr_x_valid, arr_y_valid = common_pre_post_processing.transform_inputs(X_test,
                                                   label_encoder,
                                                   onehot_encoder,
                                                   x_scaler,
                                                   y_test,
                                                   y_scaler)

Training the model

First I create the model

  
#%% Create the model
def build_model(x_size, y_size):
    model = Sequential()
    model.add(Dense(100, input_shape=(x_size,)))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))

    model.add(Dense(y_size))

    model.compile(loss='mean_squared_error',
        optimizer=Adam(),
        metrics=[metrics.mae])

    return(model)

print(arr_x_train.shape[1], arr_y_train.shape[1])

model = build_model(arr_x_train.shape[1], arr_y_train.shape[1])
model.summary()

The I built the model

  
model = build_model(arr_x_train.shape[1], arr_y_train.shape[1])
model.summary()

We fit the model

  
history = model.fit(arr_x_train, arr_y_train,
    batch_size=batch_size,
    epochs=epochs,
    shuffle=True,
    verbose=2,
    validation_data=(arr_x_valid, arr_y_valid),
    callbacks=keras_callbacks)

And finally we save the model

  
model.save(common.model_file_name)

Predicting using the model

I load the model and also the Onehot Encoder and Scalers

  
# Load the previous state of the model
model = load_model(common.model_file_name)

# Load the previous state of the enconders and scalers
label_encoder, onehot_encoder = common_categorical.load_categorical_feature_encoder()
x_scaler, y_scaler = common_scaler.load_scaler()

I define some simple values for testing purposes

  
#Some inputs to predict
# 'size', 'rooms', 'baths', 'parking', 'neighborhood'
values = [
    [180, 5, 2, 0, 'envigado'],
    [180, 5, 2, 0, 'medellin belen'],
    [180, 5, 2, 0, 'sabaneta zaratoga'],
    #310000000,97,3,2,2,sabaneta centro
    [ 97, 3, 2, 2, 'sabaneta centro'],
    #258000000,105,3,2,0,medellin belen
    [105, 3, 2, 0, 'medellin belen'],
    #335000000,160,3,3,2,medellin la mota
    [160, 3, 3, 2, 'medellin la mota'],
]

Pre process the inputs

  
# Transform inputs to the format that the model expects
model_inputs, _ = common_pre_post_processing.transform_inputs(values, label_encoder, onehot_encoder, x_scaler)

Predict using the model

  
# Use the model to predict the price for a house
y_predicted = model.predict(model_inputs)

Post process the output

  
# Transform the results into a user friendly representation
y_predicted_unscaled = common_pre_post_processing.transform_outputs(y_predicted, y_scaler)

and finally print the results

  
print('Results when:')
print('Scale Input Features = ', common.scale_features_input)
print('Scale Output Features = ', common.scale_features_output)
print('Use Categorical Feature Eencoder  = ', common.use_categorical_feature_encoder)

for i in range(0, len(values)):
    print(values[i][4], y_predicted[i][0], int(y_predicted_unscaled[i]))

Sharing data

Given that we need to pass some data to the future Tensorflow.js application we need to make sure we have the important values available in a forma that is simple to access

Export the categories

  
# Load the previous state of the enconders
label_encoder, onehot_encoder = common_categorical.load_categorical_feature_encoder()

enconder_classes = list(label_encoder.classes_)

common_file.generate_json_file(enconder_classes, common.root_share_folder, 'neighborhoods')

print(enconder_classes)

  ["envigado", "envigado abadia", "envigado aburra sur", "envigado alcal", "envigado alcala", "envigado alquerias de san isidro", "envigado alto de las flores", "envigado alto de misael", "envigado altos de misael", "envigado andalucia", "envigado antillas", "envigado av poblado", "envigado b margaritas", "envigado barrio mesa", "envigado barrio obrero"

Export the information for the scaler for the inputs

  
x_scaler, y_scaler = common_scaler.load_scaler()

mean_x = x_scaler.mean_
var_x = x_scaler.var_

common_file.generate_json_file(list(mean_x), common.root_share_folder, 'scaler-mean-x')
common_file.generate_json_file(list(var_x), common.root_share_folder, 'scaler-var-x')

  scaler-mean-x
  [114.6902816399287, 3.4014260249554367, 2.3436720142602496, 0.5928698752228164]

  scaler-var-x
  [572.865809011588, 0.37646855468812057, 0.2255615608745523, 0.4053680561513214]

and also Export the information for the scaler for the outputs

  
mean_y = y_scaler.mean_
var_y = y_scaler.var_

common_file.generate_json_file(list(mean_y), common.root_share_folder, 'scaler-mean-y')
common_file.generate_json_file(list(var_y), common.root_share_folder, 'scaler-var-y')

  scaler-mean-y
  [281340671.3761141]

  scaler-var-y
  [3091863947148531.5]

Use the Tensorflowjs Converter to transform the model so that it can be imported into Tensorflowjs

  
tensorflowjs_converter --input_format keras ./model/-inputsscaled-outputsscaled-categorical/model.h5 ./shared/model

And that's it! phew! Let me know if you have any questions!

6.

Resources

Github link for the code

Tensorflow.js

TensorFlow Docker Images

Tensorflow.js Converter

Keras Loss Functions

Keras Metrics Functions

Display Deep Learning Model Training History in Keras

Keras Model Checkpoints

Keras Early Stopping

DLightHouse

Buscar este blog