maplearn.app package¶

Application modules

Modules necessary to Mapping Learning when it is used as an application :

config: configuration
main: the main class that uses other classes to process your data
reporting: a module to format results in an html output

Submodules¶

maplearn.app.config module¶

Mapping Learning Configuration

The configuration contains 3 mandatory parts :

Inputs/outputs [io]: which file(s) and how to work with them, where to save results …)
Preprocessing [preprocess]: what to do before training the algorithm(s)?
Processing [process]: which kind of processing? Regression, supervised or unsupervised classification (clustering)? Which algorithm(s)?

An optional part, [metadata] permits to include some information about your work in the output report.

Input/Output [io]¶

Mapping Learning allows you to work on many formats (csv, excel, tiff, shp…), but also in many ways. You can choose:

to use samples, a dataset without knowledge (data), or both
the variable(s) (features) to use
to use directly the values of the variable to be predicted (label) or some codes corresponding to these values (label_id)

NB: don’t forget to check where will be saved your results (output).

[io]
# [txt] path to the samples used to train algorithm(s)
samples=
# [optional:txt] name of the column with class ID (as numbers)
label_id=
# [optional:txt] name of column with class description (described as
#                 strings)
label=
# [optional:txt] list of features to use (separated with ',')
features=
# [optional:txt] path to the dataset to predict with
data=
# [txt] path to the output folder (which will contain the results)
output=

Preprocessing [preprocess]¶

Maplearn is not intended to perform all the necessary manipulations to your dataset to make it usable by machine learning. Nevertheless, some preprocessing tools are available, that will modify the values of the data (scale), the features (reduce and ncomp), the samples (balance). Finally, separability permits to estimate the chances of getting good results with your samples.

NB: check maplearn.datahandler.packdata to see how dataset should be structured for machine learning use.

[preprocess]
# [optional:boolean] center/reduce? [true/false]
scale=
# [optional:boolean] make number of individuals about similar between 
#                    classes? [true/false]
balance=
# [optional:txt] name of the method to reduce dimensions of the dataset
#                [one between pca, lda, kbest, rfe, kernel_pca]
reduce=
# [optional:number] number of expected dimensions after reduction
ncomp=
# [optional:boolean] check separability between classes? [true/false]
separability=

Processing [process]¶

Note

Here we are finally at the most interesting part: what do you want to predict? Continuous numbers (temperature, …) or discrete values (social class, land use…)? In any case, maplearn will allow you to use lots of algorithms, and will help you obtain the most accurate predictions.

This process part will allow you to define:

type of prediction (type)
algorithm(s) to apply (algorithm)
if you want to try to improve the accuracy (optimize)
how to use your samples (kfold)
should we predict?

Note

This question may seem absurd but it is prudent not to predict results immediately. If your dataset is large and you do not know exactly which algorithm(s) are relevant, then you can focus first on the statistical results.

[process]
# [txt] which kind of process? [classification, clustering ou regression]
type=classification
# [optional:txt] how to measure distance?
distance=euclidean
# [optional:txt] algorithm(s) to use (if several, separated with ',')
algorithm=
# [optional:number] how many folds to use in cross-validation?
kfold=
# [optional:boolean] look for best hyperparameters? [true/false]
optimize=
# [optional:boolean] should predict results (exports)? [true/false]
predict=

Metadata [metadata]¶

[metadata]
# [optional:txt] give a title to your work
title = 
# [optional:txt] describe your work (please avoid special characters)
description = 
# [optional:txt] name of the author(s)
author = 

class maplearn.app.config.Config(file_config)¶

Bases: object

This class is the medium between a configuration file and the applicaton. It is able to load and check a configuration file (Yaml) and rewrite a new configuration file (that can be re-used by Mappling Learning later).

Config checks that application will be able to run properly using a given configuration:

input files exists?
value of parameters belong to expected type
…

Args:: config_file (str) : path to a configuration file

The class attributes described below reflects the sections in configuration file.

Properties:

io (dict): input/output. path to samples, dataset files and output. list of features to use…
codes (dict): label codes and corresponding names
preprocess (dict) : which preprocessing step(s) to apply
process (dict) : which processes to apply (list of algorihms…)

check()¶

Check that parameters stored in attributes are correct

Returns:: int : number of issues detected

codes¶: Dictionnary describing label codes and the name of classes

io¶: Input/Output property

preprocess¶: Dictionnary of preprocess parameters

process¶: Dictionnary of process parameters

read()¶

Load parameters from configuration file and put them in corresponding class attributes

Returns:: int : number of issues got when reading the file

write(fichier=None)¶

Write a new configuration file feeded by class attributes content.

Args:: fichier (str) : path to configuration file to write

maplearn.app.config.splitter(text)¶

Splits a character string based on several separators and remove useless empty characters.

Args:: text (str) : character string to split
Returns:: list: list of stripped character strings, None elsewhere

maplearn.app.main module¶

Main class (one class to rule the others)

This class is the engine powering Mapping Learning. It uses every other classes to load data, apply preprocesses and finally process the dataset, using one or several algorithm(s). The results are synthetized and compared.

The class can apply classification, clustering and regression processes.

Examples:

>>> from maplearn.app.main import Main

Apply 2 different classifications on a known dataset

>>> ben = Main('.', type='classification', algorithm=['knn', 'lda'])
>>> ben.load('iris')
>>> ben.preprocess()
>>> ben.process(True)

Apply every available clustering algorithm(s) on the same dataset

>>> ben = Main('.', type='clustering')
>>> ben.load('iris')
>>> ben.preprocess()
>>> ben.process(False) # do not predict results

Apply regression on another known dataset

>>> ben = Main('.', type='regression', algorithm='lm')
>>> ben.load('boston')
>>> ben.preprocess()
>>> ben.process(False) # do not predict results

class maplearn.app.main.Main(dossier, **kwargs)¶

Bases: object

Realizes every steps from loading dataset to processing

Args:

dossier (str): output path where will be stored every results
**kwargs: parameters data and processing to apply on it

Attributes:

dataset (PackData): dataset to play with

load(source, **kwargs)¶

Loads samples (labels with associated features) used for training algorithm(s)

Args:

source (str): file to load or name of an available datasets
**kwargs: parameters to specify how to use datasets (which features to use…)

load_data(source, label_id=None, label=None, features=None)¶

Load dataset to predict with previously trained algorithm(s)

Args:

source (str): path to load or name of an available dataset
label_id (optional[str]): column used to identify labels
label (optional[str]): column with labels’ names
features (list): columns to use as features. Every available columns are used if None

preprocess(**kwargs)¶

Apply preprocessings tasks asked by user and give the dataset to the Machine Learning processor

Args:: **kwargs: available preprocessing tasks (scaling dataset, reducing number of features…)

process(predict=False, optimize=False, proba=True)¶

Apply algorithm(s) to dataset

Args:

predict (bool): should the algorithm(s) be only fitted on samples or also predict results ?
optimize (bool): should maplearn look for best hyperparameters for the algorithm(s) ?
proba (bool): should maplearn try to get probabilities associated to predictions ?

maplearn.app package¶

Submodules¶

maplearn.app.config module¶

Input/Output [io]¶

Preprocessing [preprocess]¶

Processing [process]¶

Metadata [metadata]¶

maplearn.app.main module¶

maplearn.app.reporting module¶