Table Of Contents

Search

Enter search terms or a module, class or function name.

3. Data Preprocessing¶

3.1. Normalisation¶

If a dataset contains features with vastly different scales, it is advisible to normalise the features first. There are a few option:

Normalise the features to zero mean and unit variance.
Normalise the features to unit variance.
Normalise the features to unit interval.

3.2. Balanced Train-Test Split¶

Often, the class distribution in a dataset is not balanced. For example, in the SDSS dataset, we have three times as many galaxies as quasars. To correct for this bias, we might want to select a balanced training and test set. This is achieved by balanced_train_test_split().