maplearn.filehandler package

File handlers

Read/write data from different kind of files

  • Csv: tabular data as a text file
  • Excel: tabular data as a Microsoft Excel file
  • Shapefile: geographical vector file
  • ImageGeo: geographical raster file
  • FileHandler: abstract class to handle files

Submodules

maplearn.filehandler.csv module

CSV file reader and writer

With this class, you can read a text file or write a new one with your own dataset (Pandas Dataframe).

Examples:

  • Read an existing file
>>> exch = Csv(os.path.join('maplearn path', 'datasets', 'ex1.xlsx'))
>>> exch.read()
>>> print(exch.data)
  • Write a new Excel File from scratch
>>> exc = Excel(None)
>>> out_file = os.path.join('maplearn path', 'tmp', 'scratch.xlsx')
>>> df = pd.DataFrame({'A' : 1,
                       'B' : pd.Timestamp('20130102'),
                       'C' : pd.Series(2,index=list(range(4))),
                       'D' : np.array([3] * 4,dtype='int64')})
exc.write(path=out_file, data=df)
class maplearn.filehandler.csv.Csv(path)

Bases: maplearn.filehandler.filehandler.FileHandler

Handler to read and write attributes in a text file. It inherits from the abstract class FileHandler.

Attributes:

  • FileHandler’s attributes

Args:

  • path (str): path to the Csv file to open
open_()

Opens the CSV file specified in dsn[‘path’]

read()

Reads the content of the CSV file

write(path=None, data=None, overwrite=True, **kwargs)

Write specified attributes in a text File

Args:
  • path (str): path to the Excel to create and write
  • data (pandas DataFrame): dataset to write in the Excel file
  • overwrite (bool): should the output Excel file be overwritten ?

maplearn.filehandler.excel module

Excel file reader and writer

With this class, you can read an Excel file or write a new one with your own dataset (Pandas Dataframe).

Examples:

  • Read an existing Excel file
>>> exch = Excel(os.path.join('maplearn path', 'datasets', 'ex1.xlsx'))
>>> exch.read()
>>> print(exch.data)
  • Write a new Excel File from scratch
>>> exc = Excel(None)
>>> out_file = os.path.join('maplearn path', 'tmp', 'scratch.xlsx')
>>> df = pd.DataFrame({'A' : 1,
                       'B' : pd.Timestamp('20130102'),
                       'C' : pd.Series(2,index=list(range(4))),
                       'D' : np.array([3] * 4,dtype='int64')})
exc.write(path=out_file, data=df)
class maplearn.filehandler.excel.Excel(path, sheet=None)

Bases: maplearn.filehandler.filehandler.FileHandler

Handler to read and write attributes in an Excel file. It inherits from the abstract class FileHandler.

Attributes:

  • FileHandler’s attributes

Args:

  • path (str): path to the Excel file to open
  • sheet (str): name of the sheet to open
open_()

Opens the Excel file specified in dsn[‘path’]

read()

Reads the content of the opened Excel file

write(path=None, data=None, overwrite=True, **kwargs)

Write specified attributes in an Excel File

Args:
  • path (str): path to the Excel to create and write
  • data (pandas DataFrame): dataset to write in the Excel file
  • overwrite (bool): should the output Excel file be overwritten ?

maplearn.filehandler.imagegeo module

Geographic Images (raster)

This class handles raster data with geographic dimension (projection system, bounding box expressed with coordinates).

A raster data relies on:
  • a matrix of pixels (data)
  • geographic data (where to put this matrix on earth)
Example:
>>> img = ImageGeo(os.path.join('maplearn_path', 'datasets',
                                'landsat_rennes.tif'))
>>> img.read()
>>> print(img.data)
class maplearn.filehandler.imagegeo.ImageGeo(path=None, fmt='GTiff')

Bases: maplearn.filehandler.filehandler.FileHandler

Handler of geographical rasters

Args:
  • path (str): path to the raster file to read
  • fmt (str): format of the raster file (‘GTiff’… see GDAL
    documentation)
Attributes:
  • Several attributes are inherited from FileHandler class
data

The dataset read from a file or to write in a file

data_2_img(data, overwrite=False, na=None)

Transforms a data set (dataframe) into a matrix in order to export it as an image (inverse operation to __img_2_data () method).

Args:
  • data (dataframe): the dataset to transform
  • overwrite (bool): should the result data property ?
Returns:
matrix: transformed dataset
img_2_data()

Transforms the data set in order to make it easier to handle in following steps.

Converts the data set (matrix) into to 2 dimensions dataframes (where 1 line = 1 individual and 1 column = 1 feature)

Returns:
dataframe: transformed dataset (2 dimensions)
init_data(dims, dtype=None)

Creates an empty matrix with specified dimension

Args:
  • dims (list): dimensions of the image to create
  • dtype (str): numerical type of pixels
open_()

Opens the Geographical Image to get information about projection system…

pixel2xy(j, i)

Computes the geographic coordinate (X,Y) corresponding to the specified position in an image (column, row)

It does the inverse calculation of xy2pixel, and uses a gdal geomatrix

Source: http://geospatialpython.com/2011/02/clip-raster-using-shapefile.html

Args:
  • j (int): column position
  • i (int): row position
Returns:
list: geographical coordinate of the pixel (lon and lat)
read(dtype=None)

Reads the raster file and puts the matrix in data property

Args:
  • dtype (str): type values stored in pixels (int, float…)
set_geo(transf=None, prj=None)
Sets geographical dimension of a raster:
  • the projection system
  • the bounding box, whose coordinates are compatible with the given

projection system

Args:
  • prj (str): projection system
  • transf (list): affine function to translate an image

Definition of ‘transf’ (to translate an image to the right place): [0] = top left x (x Origin) [1] = w-e pixel resolution (pixel Width) [2] = rotation, 0 if image is “north up” [3] = top left y (y Origin) [4] = rotation, 0 if image is “north up” [5] = n-s pixel resolution (pixel Height)

TODO :
  • Check compatibility between bounding box and image size
  • Adds EPSG code corresponding to prj in __geo
write(path=None, data=None, overwrite=True, **kwargs)

Writes a data in a raster file

Args:
  • path (str): raster file to write data into
  • data (array): data to write
  • overwrite (bool): should the raster file be overwritten?
xy2pixel(lon, lat)

Computes the position in an image (column, row), given a geographic coordinate

Uses a gdal geomatrix (gdal.GetGeoTransform()) to calculate the pixel location of a geospatial coordinate (http://geospatialpython.com/2011/02/clip-raster-using-shapefile.html)

Args:
  • lon (float): longitude (X)
  • lat (float): latitude (Y)
Returns:
list with the position in the image (column, row)

maplearn.filehandler.shapefile module

Shapefile reader and writer

With this class, you can read a shapefile or more precisely get attributes from a shapefile. You can also write a new shapefile using geometry from an original shapefile and adding the attributes you want.

Examples:

>>> shp = Shapefile(os.path.join('maplearn path', 'datasets', 
                                 'echantillon.shp'))
>>> shp.read()
>>> print(shp.data)
TODO:
Guess character encoding in shapefile’s attributes
class maplearn.filehandler.shapefile.Shapefile(path)

Bases: maplearn.filehandler.filehandler.FileHandler

Handler to read and write attributes in a shapefile. It inherits from the abstract class FileHandler.

Attributes:

  • FileHandler’s attributes
  • str_type (str): kind of geometry (polygon, point…)
  • lst_flds (list): list of fields in dataset
open_()

Opens the shapefile and put in __ds attribute, so attributes can then be read

read()

Reads attributes associated to entities in the shapefile

Returns:
Pandas Dataframe: data (attributes) available in the shapefile
write(path=None, data=None, overwrite=True, **kwargs)

Write attributes (and only attributes) in a new shapefile, using geometries of an original shapefile.

Args:
  • path (str): path to the shapefile to create and write
  • data (pandas DataFrame): dataset to write in the shapefile
  • overwrite (bool): should the output shapefile be overwritten ?

maplearn.filehandler.filehandler module

Handling files (abstract class)

This class is to handle generic files. FileHandler is not supposed to be called directly. Use rather one of the classes that inherits from it (ImageGeo, Excel, Shapefile…).

class maplearn.filehandler.filehandler.FileHandler(path=None, **kwargs)

Bases: object

Reads data from a generic file or write data into it.

Attributes:
  • _drv (object): driver to communicate with a file (necessary for some
    formats)
  • _data (numpy array or pandas dataframe): dataset got from a file or
    to write into it. See data property.
  • opened (bool): is the file opened or not ?
Args:
  • path (str): path the file to read data from
  • **kwargs: additional settings to specify how to load data from file
data

The dataset read from a file or to write in a file

dsn

Dictionnary containing informations about data source. For example, path contains the path of the file to get data from. Other items can exist, which are specific to the data type (raster, vector or tabular, geographical or not…)

open_()

Opens a file prior to write in it

read()

Reads the dataset from the file mentioned during initialization

write(path=None, data=None, overwrite=True, **kwargs)

Writes data in a file

Args:
  • path (str): path to the file to write into
  • data (numpy array or pandas dataframe): the data to write
  • overwrite (bool): should the file be overwritten if it exists ?