Ir al contenido principal

House price prediction 3/4: What is One Hot Encoding

A series about creating a model using Python and Tensorflow and then importing the model and making predictions using Javascript in a Vue.js application, above is the vid and below you will find some useful notes.
  • Here, in part 3 of this series, I will show what is and how does one hot encoding works.
    In the first post, called House price prediction 1/4: Using Keras/Tensorflow and python, I talked about how to create a model in python, pre-process a dataset I've already created, train a model, post-process, predict, and finally about creating different files for sharing some information about the data for use on the second part.
    Then in part 2, called House price prediction 2/4: Using Tensorflow.js, Vue.js and Javascript, I took the model, the data for pre and post processing and after loading everything we were finally able to predict hose prices using Vue.js.
    And finally in part 4 I will show what and why normalizing the inputs is important.
  1. 1.


  2. 2.

    What is One Hot Encoding

    • One hot encoding is a process with which we take a set of named categories in which the order between the values is not implicit, like colors, for example red, green and blue, or fruits, for example apples, lemons and blueberries and so on and transform them into a numeric representation so that a machine learning algorithm can perform operations on those values.
    • A named category in which the order between the values is implicit does not need to be One Hot Encoded. One such category could be for example medals with values like Gold, Silver or Bronze which are awarded given a position in an ordered ranking like who came first, second or third, in such cases we can say that gold is better than silver, which is also better than bronze for example.
    • So for a list of neighborhoods, at a high level, the encoding or transformation process would typically entail
      medellin aranjuez
      medellin centro
      medellin belen rosales
      medellin la castellana
      medellin la castellana
      Transforming the named categories into numbers by for example assigning an index into each one of them:
      0,medellin aranjuez
      1,medellin centro
      2,medellin belen rosales
      3,medellin la castellana
      4,medellin la castellana
      Then we would create a vector representation of them with as many columns as categories we have, in which case we would take the previous indexes and assign a “One” into the column that corresponds with the index of the category and “Zeros” on the other columns,
      Column Indexes,Neighborhood:
      1,0,0,0,0,medellin aranjuez
      0,1,0,0,0,medellin centro
      0,0,1,0,0,medellin belen rosales
      0,0,0,1,0,medellin la castellana
      0,0,0,0,1,medellin la castellana
  3. 3.

    Using a Scikit Learn Preprocessor

    • For using a preprocessor like the label encoder you need to first instantiate it, fit it with the complete dataset and then use transform when you need to encode a particular set of values.
      from sklearn.preprocessing import OneHotEncoder, LabelEncoder
      neighborhoods = [
      labels_to_test = [
      label_encoder = LabelEncoder()
      print('Label Encoded String', label_encoder.transform(labels_to_test))
  4. 4.

    Using the Label Encoder and the One Hot Encoder

    • First you should configure both encoders by calling fit on both of them and also transform on the Label encoder
      label_encoder = LabelEncoder()
      onehot_encoder = OneHotEncoder(sparse=False)
      categorical_column = label_encoder.transform(complete_set_of_data)
      integer_encoded = categorical_column.reshape(len(categorical_column), 1)
    • Then you would call transform on both when transforming a set of values
      values_to_transform = label_encoder.transform(values_to_transform)
      integer_encoded = values_to_transform.values.reshape(len(values_to_transform), 1)
      onehot_encoded = onehot_encoder.transform(integer_encoded)
  5. 5.

    Onehot encode in Javascript

    • Before using the model created in python inside Javascript we have to One hot encode the data we want to predict, in this case I used a javascript library for that
      import * as onehot from 'one-hot-enum';
      let reducedlist = this.completeSetOfData.slice(1);
      let enumaration = onehot.enumaration(reducedlist);
      let encoded = onehot.encode(reducedlist);
      let zeros = Array.apply(null, Array(encoded[0].length)).map(Number.prototype.valueOf, 0);
      this.dictionary = {};
      for (let i in enumaration) {
        this.dictionary[enumaration[i]] = encoded[i];
      this.dictionary[this.neighborhoods[0]] = zeros;
    • Then for transforming a string into it's one hot encoded vector representation you just need to provide the string into the dictionary
  6. 6.


Entradas populares de este blog

How to copy files from and to a running Docker container

Sometimes you want to copy files to or from a container that doesn’t have a volume previously created, in this quick tips episode, you will learn how. Above is the vid and below you will find some useful notes. 1. Pre-reqs Have Docker installed 2. Start a Docker container For this video I will be using a Jenkins image as an example, so let’s first download it by using docker pull docker pull jenkins/jenkins:lts

How to create an AEM component using Reactjs

In this tutorial, I will show how to use use Adobe's archetype to create an AEM application with React.js support and also how to add a new React.js component so that it can be added into a page, above is the vid and below you will find some useful notes. In the second part we will see how to configure the Sling Model for the AEM React component. 1. Pre-reqs Have access to an Adobe Experience Manager instance. You will need aem 6.4 Service Pack 2 or newer. Have Maven installed, understand how it works and also understand how to use Adobe's archetype, you can watch my video about maven here: Creating an AEM application using Maven and Adobe's archetype 2.