Ir al contenido principal

House price prediction 3/4: What is One Hot Encoding

A series about creating a model using Python and Tensorflow and then importing the model and making predictions using Javascript in a Vue.js application, above is the vid and below you will find some useful notes.
  • Here, in part 3 of this series, I will show what is and how does one hot encoding works.
    In the first post, called House price prediction 1/4: Using Keras/Tensorflow and python, I talked about how to create a model in python, pre-process a dataset I've already created, train a model, post-process, predict, and finally about creating different files for sharing some information about the data for use on the second part.
    Then in part 2, called House price prediction 2/4: Using Tensorflow.js, Vue.js and Javascript, I took the model, the data for pre and post processing and after loading everything we were finally able to predict hose prices using Vue.js.
    And finally in part 4 I will show what and why normalizing the inputs is important.
  1. 1.


  2. 2.

    What is One Hot Encoding

    • One hot encoding is a process with which we take a set of named categories in which the order between the values is not implicit, like colors, for example red, green and blue, or fruits, for example apples, lemons and blueberries and so on and transform them into a numeric representation so that a machine learning algorithm can perform operations on those values.
    • A named category in which the order between the values is implicit does not need to be One Hot Encoded. One such category could be for example medals with values like Gold, Silver or Bronze which are awarded given a position in an ordered ranking like who came first, second or third, in such cases we can say that gold is better than silver, which is also better than bronze for example.
    • So for a list of neighborhoods, at a high level, the encoding or transformation process would typically entail
      medellin aranjuez
      medellin centro
      medellin belen rosales
      medellin la castellana
      medellin la castellana
      Transforming the named categories into numbers by for example assigning an index into each one of them:
      0,medellin aranjuez
      1,medellin centro
      2,medellin belen rosales
      3,medellin la castellana
      4,medellin la castellana
      Then we would create a vector representation of them with as many columns as categories we have, in which case we would take the previous indexes and assign a “One” into the column that corresponds with the index of the category and “Zeros” on the other columns,
      Column Indexes,Neighborhood:
      1,0,0,0,0,medellin aranjuez
      0,1,0,0,0,medellin centro
      0,0,1,0,0,medellin belen rosales
      0,0,0,1,0,medellin la castellana
      0,0,0,0,1,medellin la castellana
  3. 3.

    Using a Scikit Learn Preprocessor

    • For using a preprocessor like the label encoder you need to first instantiate it, fit it with the complete dataset and then use transform when you need to encode a particular set of values.
      from sklearn.preprocessing import OneHotEncoder, LabelEncoder
      neighborhoods = [
      labels_to_test = [
      label_encoder = LabelEncoder()
      print('Label Encoded String', label_encoder.transform(labels_to_test))
  4. 4.

    Using the Label Encoder and the One Hot Encoder

    • First you should configure both encoders by calling fit on both of them and also transform on the Label encoder
      label_encoder = LabelEncoder()
      onehot_encoder = OneHotEncoder(sparse=False)
      categorical_column = label_encoder.transform(complete_set_of_data)
      integer_encoded = categorical_column.reshape(len(categorical_column), 1)
    • Then you would call transform on both when transforming a set of values
      values_to_transform = label_encoder.transform(values_to_transform)
      integer_encoded = values_to_transform.values.reshape(len(values_to_transform), 1)
      onehot_encoded = onehot_encoder.transform(integer_encoded)
  5. 5.

    Onehot encode in Javascript

    • Before using the model created in python inside Javascript we have to One hot encode the data we want to predict, in this case I used a javascript library for that
      import * as onehot from 'one-hot-enum';
      let reducedlist = this.completeSetOfData.slice(1);
      let enumaration = onehot.enumaration(reducedlist);
      let encoded = onehot.encode(reducedlist);
      let zeros = Array.apply(null, Array(encoded[0].length)).map(Number.prototype.valueOf, 0);
      this.dictionary = {};
      for (let i in enumaration) {
        this.dictionary[enumaration[i]] = encoded[i];
      this.dictionary[this.neighborhoods[0]] = zeros;
    • Then for transforming a string into it's one hot encoded vector representation you just need to provide the string into the dictionary
  6. 6.


Entradas populares de este blog

Create a custom AEM workflow process step with a dialog

In this tutorial I talk about how to create a custom workflow step process with an additional dialog for configuring it, above is the vid and below you will find some useful notes. 1. Pre-reqs Have access to an Adobe Experience Manager instance. Have Maven installed, understand how it works and also understand how to use Adobe's archetype, you can watch my video about maven here: Creating an AEM application using Maven and Adobe's archetype 2. What is an AEM Workflow and workflow model Workflows allow you to automate different tasks inside AEM by defining a s

Creating Docker containers for Adobe Experience Manager

This is a Docker tutorial for creating a docker image for the Galen framework, above is the vid and below you will find some of the steps followed. Adobe experience manager is a content management system which in a nutshell is an application that allows us to create web sites to be consumed by end users. You might be familiar with other such applications like wordpress or drupal which serves the same purpose A typical deployment would be comprised of two AEM instances, the author instance used for creating and modifying content, the publish instance which serves the content and finally we have a dispatcher which is a static web server used for caching, load balancing and some security purposes. We can configure an AEM instance to work as an author or publish instance by either changing the file name

Creating a Mongo replicaset using docker: Mongo replicaset + Nodejs + Docker Compose

This is a Docker tutorial for creating a Mongo replica set using docker compose and also a couple more containers to experiment with the replica set, above is the vid and below you will find some of the steps followed. Steps Pre-reqs Have node.js installed And docker installed (make sure you have docker-compose as well) Create a container for defining configurations for a mongo instance Create a container for setting up the replica set Create a simple node app using expressjs and mongoose (A modified version from the previous video ) Create a docker-compose file with the mongo and setup containers and two additional containers for experimenting with the replica set Build, Run and experiment with your new containers Create a dockerfile for the first mongo container (not really needed but you could configure more stuff if needed) Include container with mongo preinstalled:  FROM mongo Create default/working directory:  WORKDIR /usr/src/configs Copy mongo