Chapter 4 Missing values

First we get the percentages of missing values per feature, and stored the percentages in variable called missing_values. We found the missing values are either in the form of ‘na’ or blank.

4.1 Graph I.

The first graph shows the percentage of missing values per feature:

We use percentage plot later as a reference to drop features with too many missing values.

4.2 Graph II.

The second graph shows the missing values by variable:

This one is easier to tell row/column missing patterns by observing links between missing values for different features.

4.3 Feature Engineering

We drop all the features which have more than 75% missing values to avoid ending up with misleading results. 77 columns are dropped. We go from 163 columns to 86 columns.

## NULL