maplearn.datahandler package¶

Data handlers

Interim classes between file(s) and dataset

packdata: creates a dataset with samples and data
labels: labels associated to features (in samples)
loader: loads data from a file or known datasets
writer: writes data into a file
signature: graphs a dataset
plotter: generic class to make charts

Submodules¶

maplearn.datahandler.packdata module¶

Machine Learning dataset¶

A machine learning dataset is classically a table where:

columns are all variables that can be used by machine learning algorithms
lines correspond to the individuals

Variables

The variables fall into two categories:

the variables for which you have information: these are the predictors (or features)
the variable to predict, also called label

Individuals

The individuals for whom you know the label are called samples.
The others are just called data

class maplearn.datahandler.packdata.PackData(X=None, Y=None, data=None, **kwargs)¶

Bases: object

PackData: a container for datasets

A PackData contains:

samples (Y and X) to fit algorithm(s)
- Y: a vector with samples’ labels
- X: a matrix with samples’ features
data: 2d matrix with features to use for prediction

PackData checks if samples are compatible with data (same features…) and is compatible with Machine Learning algorithm(s).

Example:

>>> import numpy as np
>>> data = np.random.random((10, 5))
>>> x = np.random.random((10, 5))
>>> y = np.random.randint(1, 10, size=10)
>>> ds = PackData(x, y, data)
>>> print(ds)

Args:

X (array): 2d matrix with features of samples
Y (array): vector with labels of samples
data (array): 2d matrix with features
**kwargs: other parameters about dataset (features, na…)

Attributes:

not_nas: vector with non-NA indexes

X¶: X (array): 2d matrix with features of samples

Y¶: Y (array): vector with labels of samples

balance(seuil=None)¶

Balance samples and remove some individuals within the biggest classes.

Args:

seuil (int): max number of samples inside a class

classes¶: dict: labels classes and associated number of individuals

data¶: data (array): 2d matrix with features

features¶: list: list of features of the dataset

load(X=None, Y=None, data=None, features=None)¶

Loads data to the packdata

Args:

X (array): 2d matrix with features of samples
Y (array): vector with labels of samples
data (array): 2d matrix with features
features (list): list of features

plot(prefix='sig')¶

Plots the dataset (signature): * one chart for the whole samples * one chart per samples’ class

Args:

prefix (str): prefix of output files to save charts in

reduit(meth='lda', ncomp=None)¶

Reduces number of dimensions of data and X

Args:

meth (str): reduction method to apply
ncomp (int): number of dimensions expected

scale()¶: Normalizes data and X matrices

separability(metric='euclidean')¶

Performs separability analysis between samples

Arg:

metric (str): name of the distance used

maplearn.datahandler.labels module¶

Labels

This class handles labels associated to features in samples:

counts how many samples for each class

class maplearn.datahandler.labels.Labels(Y, codes=None, output=None)¶

Bases: object

Samples labels used in PackData class

Args:

Y (array): vector with samples’ labels
codes (dict): dictionnary with labels code and associated description

Attributes:

summary ()
dct_codes (dict): dictionnary with labels code and associated description

Property:

Y (array): vector containing labels of samples (codes)

Y¶: Samples (as a vector)

convert()¶: Conversion between codes

count()¶: Summarizes labels of each class (how many samples for each class)

libelle2code()¶: Converts labels’ names into corresponding codes

maplearn.datahandler.loader module¶

Loads data from a file

This class aims to feed a PackData. It gathers data from one or more files or uses known datasets stored in a library

class maplearn.datahandler.loader.Loader(source, **kwargs)¶

Bases: object

Loads data from a file or a known dataset

Args:

source (str): path the file to load or name of a dataset (“iris” for example)
**kwargs: other attributes to drive loading (handles NA, labels…)

Attributes:

src (dct): informations about the source (type, path…)
X: samples’ features
Y: samples’ labels
aData:
matrix: (needed when loading from a raster file)
features
nomenclature

Examples:

Loading data from a know dataset:

>>> ldr = Loader('iris')
>>> print(ldr)
>>> print(ldr.X, ldr.Y)
>>> print(ldr.data)

Loading data from a file (here a shapefile):

>>> ldr = Loader(os.path.join('maplearn_path', 'datasets',
                              'ex1.xlsx'))
>>> print(ldr)
>>> print(ldr.X, ldr.Y)

X¶: Matrix of values corresponding to samples

Y¶: Vector of labels describing samples. Values to be predicted by machine learning algorithm

aData¶: Data to predict

df¶: Dataframe loaded

features¶: List of features that contains the dataset

matrix¶: Data served as a matrix. Needed when loading data from an image

nomenclature¶: Legends of labels. Dictionnary combining labels codes and their corresponding names

run(**kwargs)¶

Gets samples (X with features and Y containing labels)

Args:

**kwargs:
- features (list): features to load
- label (str): column with class labels (description)
- label_id (str): column with labels codes

maplearn.datahandler.writer module¶

Writes data into a file

This class is to be used with PackData. It puts data into one file (different formats are useable).

class maplearn.datahandler.writer.Writer(path=None, **kwargs)¶

Bases: object

Writes data in a file (different formats available)

Args:

path (str): path towards the file to write data into
**kwargs:
- origin (str): path to the original file used as a model

path¶

run(data, path=None, na=None, dtype=None)¶

Writes data into a file

Args:

data (pandas dataframe): dataset to write
path (str): path towards the file to write data into
na : value used as a code for “NoData”
dtype (np.dtype): desired data type

maplearn.datahandler.signature module¶

Signature

This class makes charts about a dataset:

spectral signature
temporal signature

Example:

>>> from maplearn.datahandler.loader import Loader
>>> from maplearn.datahandler.signature import Signature
>>> ldr = Loader('iris')
>>> sig = Signature()
>>> sig.plot(ldr.X, title='test')

class maplearn.datahandler.signature.Signature(data, features=None, model='boxplot', output=None)¶

Bases: object

Makes charts about a dataset:

one global graph
one graph per class in samples (if samples are available)

Args:

data (array or DataFrame): data to plot
features (list): name of columns
model (str): how to plot signature (plot or boxplot)
ouput (str): path to the output directory where will be saved plots

plot(title='Signature du jeu de donnees', file=None)¶

Plots (spectral) signature of data as boxplots or points depending of the number of features

Args:

title (str): title to add to the plot
file (str): name of the output file

plot_class(data_class, label='', file=None)¶

Plots the signature of one class above the whole dataset

Args:

data_class (dataframe): data of one class
label (str): label of the class to plot
file (str): path to the file to save the chart in

maplearn.datahandler package¶

Submodules¶

maplearn.datahandler.packdata module¶

Machine Learning dataset¶

maplearn.datahandler.labels module¶

maplearn.datahandler.loader module¶

maplearn.datahandler.writer module¶

maplearn.datahandler.signature module¶

maplearn.datahandler.plotter module¶