maplearn.app package

Application modules

Modules necessary to Mapping Learning when it is used as an application :

  • config: configuration
  • main: the main class that uses other classes to process your data
  • reporting: a module to format results in an html output

Submodules

maplearn.app.config module

Mapping Learning Configuration

The configuration contains 3 mandatory parts :

  • Inputs/outputs [io]: which file(s) and how to work with them, where to save results …)
  • Preprocessing [preprocess]: what to do before training the algorithm(s)?
  • Processing [process]: which kind of processing? Regression, supervised or unsupervised classification (clustering)? Which algorithm(s)?

An optional part, [metadata] permits to include some information about your work in the output report.

Input/Output [io]

Mapping Learning allows you to work on many formats (csv, excel, tiff, shp…), but also in many ways. You can choose:

  • to use samples, a dataset without knowledge (data), or both
  • the variable(s) (features) to use
  • to use directly the values of the variable to be predicted (label) or some codes corresponding to these values (label_id)

NB: don’t forget to check where will be saved your results (output).

[io]
# [txt] path to the samples used to train algorithm(s)
samples=
# [optional:txt] name of the column with class ID (as numbers)
label_id=
# [optional:txt] name of column with class description (described as
#                 strings)
label=
# [optional:txt] list of features to use (separated with ',')
features=
# [optional:txt] path to the dataset to predict with
data=
# [txt] path to the output folder (which will contain the results)
output=

Preprocessing [preprocess]

Maplearn is not intended to perform all the necessary manipulations to your dataset to make it usable by machine learning. Nevertheless, some preprocessing tools are available, that will modify the values of the data (scale), the features (reduce and ncomp), the samples (balance). Finally, separability permits to estimate the chances of getting good results with your samples.

NB: check maplearn.datahandler.packdata to see how dataset should be structured for machine learning use.

[preprocess]
# [optional:boolean] center/reduce? [true/false]
scale=
# [optional:boolean] make number of individuals about similar between 
#                    classes? [true/false]
balance=
# [optional:txt] name of the method to reduce dimensions of the dataset
#                [one between pca, lda, kbest, rfe, kernel_pca]
reduce=
# [optional:number] number of expected dimensions after reduction
ncomp=
# [optional:boolean] check separability between classes? [true/false]
separability=

Processing [process]

Note

Here we are finally at the most interesting part: what do you want to predict? Continuous numbers (temperature, …) or discrete values (social class, land use…)? In any case, maplearn will allow you to use lots of algorithms, and will help you obtain the most accurate predictions.

This process part will allow you to define:

  • type of prediction (type)
  • algorithm(s) to apply (algorithm)
  • if you want to try to improve the accuracy (optimize)
  • how to use your samples (kfold)
  • should we predict?

Note

This question may seem absurd but it is prudent not to predict results immediately. If your dataset is large and you do not know exactly which algorithm(s) are relevant, then you can focus first on the statistical results.

[process]
# [txt] which kind of process? [classification, clustering ou regression]
type=classification
# [optional:txt] how to measure distance?
distance=euclidean
# [optional:txt] algorithm(s) to use (if several, separated with ',')
algorithm=
# [optional:number] how many folds to use in cross-validation?
kfold=
# [optional:boolean] look for best hyperparameters? [true/false]
optimize=
# [optional:boolean] should predict results (exports)? [true/false]
predict=

Metadata [metadata]

[metadata]
# [optional:txt] give a title to your work
title = 
# [optional:txt] describe your work (please avoid special characters)
description = 
# [optional:txt] name of the author(s)
author = 
class maplearn.app.config.Config(file_config)

Bases: object

This class is the medium between a configuration file and the applicaton. It is able to load and check a configuration file (Yaml) and rewrite a new configuration file (that can be re-used by Mappling Learning later).

Config checks that application will be able to run properly using a given configuration:

  • input files exists?
  • value of parameters belong to expected type
Args:
config_file (str) : path to a configuration file

The class attributes described below reflects the sections in configuration file.

Properties:
  • io (dict): input/output. path to samples, dataset files and output. list of features to use…
  • codes (dict): label codes and corresponding names
  • preprocess (dict) : which preprocessing step(s) to apply
  • process (dict) : which processes to apply (list of algorihms…)
check()

Check that parameters stored in attributes are correct

Returns:
int : number of issues detected
codes

Dictionnary describing label codes and the name of classes

io

Input/Output property

preprocess

Dictionnary of preprocess parameters

process

Dictionnary of process parameters

read()

Load parameters from configuration file and put them in corresponding class attributes

Returns:
int : number of issues got when reading the file
write(fichier=None)

Write a new configuration file feeded by class attributes content.

Args:
fichier (str) : path to configuration file to write
maplearn.app.config.splitter(text)

Splits a character string based on several separators and remove useless empty characters.

Args:
text (str) : character string to split
Returns:
list: list of stripped character strings, None elsewhere

maplearn.app.main module

Main class (one class to rule the others)

This class is the engine powering Mapping Learning. It uses every other classes to load data, apply preprocesses and finally process the dataset, using one or several algorithm(s). The results are synthetized and compared.

The class can apply classification, clustering and regression processes.

Examples:
>>> from maplearn.app.main import Main
  • Apply 2 different classifications on a known dataset
>>> ben = Main('.', type='classification', algorithm=['knn', 'lda'])
>>> ben.load('iris')
>>> ben.preprocess()
>>> ben.process(True)
  • Apply every available clustering algorithm(s) on the same dataset
>>> ben = Main('.', type='clustering')
>>> ben.load('iris')
>>> ben.preprocess()
>>> ben.process(False) # do not predict results
  • Apply regression on another known dataset
>>> ben = Main('.', type='regression', algorithm='lm')
>>> ben.load('boston')
>>> ben.preprocess()
>>> ben.process(False) # do not predict results
class maplearn.app.main.Main(dossier, **kwargs)

Bases: object

Realizes every steps from loading dataset to processing

Args:
  • dossier (str): output path where will be stored every results
  • **kwargs: parameters data and processing to apply on it
Attributes:
  • dataset (PackData): dataset to play with
load(source, **kwargs)

Loads samples (labels with associated features) used for training algorithm(s)

Args:
  • source (str): file to load or name of an available datasets
  • **kwargs: parameters to specify how to use datasets (which features to use…)
load_data(source, label_id=None, label=None, features=None)

Load dataset to predict with previously trained algorithm(s)

Args:
  • source (str): path to load or name of an available dataset
  • label_id (optional[str]): column used to identify labels
  • label (optional[str]): column with labels’ names
  • features (list): columns to use as features. Every available columns are used if None
preprocess(**kwargs)

Apply preprocessings tasks asked by user and give the dataset to the Machine Learning processor

Args:
**kwargs: available preprocessing tasks (scaling dataset, reducing number of features…)
process(predict=False, optimize=False, proba=True)

Apply algorithm(s) to dataset

Args:
  • predict (bool): should the algorithm(s) be only fitted on samples or also predict results ?
  • optimize (bool): should maplearn look for best hyperparameters for the algorithm(s) ?
  • proba (bool): should maplearn try to get probabilities associated to predictions ?

maplearn.app.reporting module