SDSIA is a Python module to generate synthetic data sets for image analysis, and a C library to manipulate these data sets.
The goal of the Python module is to provide a way to easily generate data sets of images used by image analysis softwares or algorithms, in a context of learning or development.
The generation of the images is made using the 3D rendering software POV-Ray. This software uses text scripts to defined the scene to render, so it's very easy to parameterize a script to create variations of a given scene. The Python module then render as many samples as needed in the set, each time using different parameters' value.
By creating its own POV-Ray script, the user can create any needed data set, selecting precisely which aspect of the scene is constant and which is variable. The level of difficulty of the data set, from the point of view of the software or algorithm used for analysis, can then be controlled at will. By choosing the appropriate variable in the scene, the user could also study a particular property of a given software or algorithm (e.g, is it more sensitive to variation in shape or variation in color).
Once the user has written the script to render images, the python module takes care of the tedious tasks of rendering (only needed data sets), files and folders management, creation of a description file in JSON format for later use. It also generates perfect masks (black and white image) by modifying the texture properties of the objects in the scene.
The number of images per data set, the dimensions and format of each image are also defined by the user. Then, one can create sets corresponding to its needs, in particular memory, disk storage, processing time limits.
The current version of SDSIA is designed for image segmentation (localization of pixels corresponding to an object in a scene). However it has been developped with the view to be extended to other kind of data sets.
The C library offers the following functionalities: loading a data set from its description file, spitting the data set into user defined categories (e.g. training, validation, test), shuffling the data set, looping through the samples of the data set. It provides the samples of the data set as GenBrush objects to manipulate them easily through this library's functionalities.
The Python module doesn't require any particular operation, but you will need to have the following Python module installed on your machine: os, sys, json, glob, subprocess, re, shutil
- Create a directory which will contains this repository and all the repositories it is depending on. Lets call it "Repos"
- Download the master branch of this repository into "Repos". Unzip it if necessary.
- The folder's name for the repository should be followed by "-master", rename the folder by removing "-master".
- cd to the repository's folder
- If wget is not installed:
sudo apt-get update && sudo apt-get install wget
on Ubuntu 16.04, orbrew install wget
on Mac OSx - If gawk is not installed:
sudo apt-get update && sudo apt-get install gawk
on Ubuntu 16.04,brew install gawk
on Mac OSx - If this repository is the first one you are installing in "Repos", run the command
make -k pbmake_wget
- Run the command
make
to compile the repository. - Eventually, run the command
main
to run the unit tests and check everything is ok. - Refer to the documentation to learn how to use this repository.
The dependancies to other repositories should be resolved automatically and needed repositories should be installed in the "Repos" folder. However this process is not completely functional and some repositories may need to be installed manually. In this case, you will see a message from the compiler saying it cannot find some headers. Then install the missing repository with the following command, e.g. if "pbmath.h" is missing: make pbmath_wget
. The repositories should compile fine on Ubuntu 16.04. On Mac OSx, there is currently a problem with the linker.
If you need assistance feel free to contact me with my gmail address: at bayashipascal.