maplearn.datahandler package¶
Data handlers
Interim classes between file(s) and dataset
- packdata: creates a dataset with samples and data
- labels: labels associated to features (in samples)
- loader: loads data from a file or known datasets
- writer: writes data into a file
- signature: graphs a dataset
- plotter: generic class to make charts
Submodules¶
maplearn.datahandler.packdata module¶
Machine Learning dataset¶
A machine learning dataset is classically a table where:
- columns are all variables that can be used by machine learning algorithms
- lines correspond to the individuals
Variables
The variables fall into two categories:
- the variables for which you have information: these are the predictors (or features)
- the variable to predict, also called label
Individuals
- The individuals for whom you know the label are called samples.
- The others are just called data
-
class
maplearn.datahandler.packdata.
PackData
(X=None, Y=None, data=None, **kwargs)¶ Bases:
object
PackData: a container for datasets
A PackData contains:
- samples (Y and X) to fit algorithm(s)
- Y: a vector with samples’ labels
- X: a matrix with samples’ features
- data: 2d matrix with features to use for prediction
PackData checks if samples are compatible with data (same features…) and is compatible with Machine Learning algorithm(s).
- Example:
>>> import numpy as np >>> data = np.random.random((10, 5)) >>> x = np.random.random((10, 5)) >>> y = np.random.randint(1, 10, size=10) >>> ds = PackData(x, y, data) >>> print(ds)
- Args:
- X (array): 2d matrix with features of samples
- Y (array): vector with labels of samples
- data (array): 2d matrix with features
- **kwargs: other parameters about dataset (features, na…)
- Attributes:
- not_nas: vector with non-NA indexes
-
X
¶ X (array): 2d matrix with features of samples
-
Y
¶ Y (array): vector with labels of samples
-
balance
(seuil=None)¶ Balance samples and remove some individuals within the biggest classes.
- Args:
- seuil (int): max number of samples inside a class
-
classes
¶ dict: labels classes and associated number of individuals
-
data
¶ data (array): 2d matrix with features
-
features
¶ list: list of features of the dataset
-
load
(X=None, Y=None, data=None, features=None)¶ Loads data to the packdata
- Args:
- X (array): 2d matrix with features of samples
- Y (array): vector with labels of samples
- data (array): 2d matrix with features
- features (list): list of features
-
plot
(prefix='sig')¶ Plots the dataset (signature): * one chart for the whole samples * one chart per samples’ class
- Args:
- prefix (str): prefix of output files to save charts in
-
reduit
(meth='lda', ncomp=None)¶ Reduces number of dimensions of data and X
- Args:
- meth (str): reduction method to apply
- ncomp (int): number of dimensions expected
-
scale
()¶ Normalizes data and X matrices
-
separability
(metric='euclidean')¶ Performs separability analysis between samples
- Arg:
- metric (str): name of the distance used
maplearn.datahandler.labels module¶
Labels
This class handles labels associated to features in samples:
- counts how many samples for each class
-
class
maplearn.datahandler.labels.
Labels
(Y, codes=None, output=None)¶ Bases:
object
Samples labels used in PackData class
- Args:
- Y (array): vector with samples’ labels
- codes (dict): dictionnary with labels code and associated description
- Attributes:
- summary ()
- dct_codes (dict): dictionnary with labels code and associated description
- Property:
- Y (array): vector containing labels of samples (codes)
-
Y
¶ Samples (as a vector)
-
convert
()¶ Conversion between codes
-
count
()¶ Summarizes labels of each class (how many samples for each class)
-
libelle2code
()¶ Converts labels’ names into corresponding codes
maplearn.datahandler.loader module¶
Loads data from a file
This class aims to feed a PackData. It gathers data from one or more files or uses known datasets stored in a library
-
class
maplearn.datahandler.loader.
Loader
(source, **kwargs)¶ Bases:
object
Loads data from a file or a known dataset
- Args:
- source (str): path the file to load or name of a dataset (“iris” for example)
- **kwargs: other attributes to drive loading (handles NA, labels…)
- Attributes:
- src (dct): informations about the source (type, path…)
- X: samples’ features
- Y: samples’ labels
- aData:
- matrix: (needed when loading from a raster file)
- features
- nomenclature
- Examples:
Loading data from a know dataset:
>>> ldr = Loader('iris') >>> print(ldr) >>> print(ldr.X, ldr.Y) >>> print(ldr.data)
Loading data from a file (here a shapefile):
>>> ldr = Loader(os.path.join('maplearn_path', 'datasets', 'ex1.xlsx')) >>> print(ldr) >>> print(ldr.X, ldr.Y)
-
X
¶ Matrix of values corresponding to samples
-
Y
¶ Vector of labels describing samples. Values to be predicted by machine learning algorithm
-
aData
¶ Data to predict
-
df
¶ Dataframe loaded
-
features
¶ List of features that contains the dataset
-
matrix
¶ Data served as a matrix. Needed when loading data from an image
-
nomenclature
¶ Legends of labels. Dictionnary combining labels codes and their corresponding names
maplearn.datahandler.writer module¶
Writes data into a file
This class is to be used with PackData. It puts data into one file (different formats are useable).
-
class
maplearn.datahandler.writer.
Writer
(path=None, **kwargs)¶ Bases:
object
Writes data in a file (different formats available)
- Args:
- path (str): path towards the file to write data into
- **kwargs:
- origin (str): path to the original file used as a model
-
path
¶
-
run
(data, path=None, na=None, dtype=None)¶ Writes data into a file
- Args:
- data (pandas dataframe): dataset to write
- path (str): path towards the file to write data into
- na : value used as a code for “NoData”
- dtype (np.dtype): desired data type
maplearn.datahandler.signature module¶
Signature
This class makes charts about a dataset:
- spectral signature
- temporal signature
- Example:
>>> from maplearn.datahandler.loader import Loader >>> from maplearn.datahandler.signature import Signature >>> ldr = Loader('iris') >>> sig = Signature() >>> sig.plot(ldr.X, title='test')
-
class
maplearn.datahandler.signature.
Signature
(data, features=None, model='boxplot', output=None)¶ Bases:
object
Makes charts about a dataset:
- one global graph
- one graph per class in samples (if samples are available)
- Args:
- data (array or DataFrame): data to plot
- features (list): name of columns
- model (str): how to plot signature (plot or boxplot)
- ouput (str): path to the output directory where will be saved plots
-
plot
(title='Signature du jeu de donnees', file=None)¶ Plots (spectral) signature of data as boxplots or points depending of the number of features
- Args:
- title (str): title to add to the plot
- file (str): name of the output file
-
plot_class
(data_class, label='', file=None)¶ Plots the signature of one class above the whole dataset
- Args:
- data_class (dataframe): data of one class
- label (str): label of the class to plot
- file (str): path to the file to save the chart in