-
Notifications
You must be signed in to change notification settings - Fork 618
Project. Overview
Our goal is to create the foundations of a "next-generation" interactive visualization software in Python.
This idea comes from the fact that, whereas the amount of data out there is increasing exponentially fast, the current visualization tools available today in Python do not really scale to big data. The major plotting library in Python is Matplotlib and is more focused on the generation of static publication-ready figures than interactive visualization. These are really two different, and nearly orthogonal goals. For the former, high display quality is the major objective, whereas speed and reactivity is much more important for the latter. Matplotlib can, and is widely used as an interactive visualization library, but I think it has not been designed primarily for this. Consequently, the frame rate tends to be low on medium-size data sets, and million-points data sets can not be decently visualized in this way.
There are a myriad of smaller plotting/visualization libraries of various importance (including ours) which all have slightly different objectives. Also, I should mention Mayavi, which is probably the most famous 3D visualization library in Python, but it's not really adapted for 2D plotting. So it seems that work has yet to be done in this domain.
The long-term objective we have is to develop a single library which is the equivalent of Matplotlib, but for fast interactive visualization of large data sets. Our primary goal is not to make publication quality plots, but rather to get a sense of the data by visualizing it interactively. The nature of the data can be anything: real-time signals, maps, high-dimensional points, surfaces, volumes, images, textures, etc.
Having a one-stop software for doing all this is, admittedly, a quite difficult and ambitious goal that could take years. But I think it is a reasonable objective, and it can be achieved through multiple successive steps.
The very first step is to make decisions about the tools that would be used for such a library. Python is the most obvious decision, as it is now the platform of choice for interactive computing and visualization. Maximum compatibility with both Python 2.x and 3.x should be achieved. Regarding the low-level graphical API, OpenGL is also a natural decision. It is widespread, mature, open, cross-platform, widely supported, supports custom GPU shaders, and brings hardware acceleration quite naturaly. Major drawbacks are: high fragmentation between the different versions, implementation of mixed quality among the different video card drivers, and a notorious gibberish API. In my opinion, OpenGL is still a good choice despite the drawbacks.
Shaders will have a central part in the library, as they bring hardware-accelerated generic computations at vertex- and pixel- level.
One could be worried that Python is not well adapted to high-performance visualization, as it is an interpreted and "slow" language, whereas fast visualization means the every frame should be rendered in only 15 milliseconds. My experience with the development of Galry is that it is generally possible to design the implementation and the visualizations themselves such that the framerate is limited by the GPU rather than the Python interpreter. Using Vertex Buffer Objects, NumPy, and PyOpenGL allows to handle large arrays of data quite efficiently and probably close to C performance, although this latter claim needs to be verified. Using as few OpenGL calls as possible at every frame and using large arrays limits the performance loss due to the Python interpreter. Any particular visualization should be designed with this goal in mind, and it is a crucial constraint for the future architecture of the library.
One other requirement is that the programming interface should be portable in diferent languages, especially Javascript. Web-based visualization has a high potential in my opinion, and WebGL is becoming more and more mature and widely supported, particularly on mobile devices. Ideally, it should be easy to port a visualization from Python to Javascript using two implementations of the same underlying visualization library.
A modular and layered architecture will offer a solid code base for low-level features (low-level visualization, and interaction system), and offer a well-designed object-oriented interface to write custom visualizations. Specific visualizations such as 2D plotting, 3D visualizations, etc. will be written on top of these layers.
A major objective is that this library should stay out of the developer's way: it will impose as few technical constraints as possible. All aspects of a visualization will be customizable.
The lowest layer is an object-oriented wrapper on top of OpenGL, designed to hide the oddities and verbosity of the original API behind a nice interface. This layer targets OpenGL 2.0 and higher, as well as OpenGL ES 2.0. The two specifications are close but not identical, and this layer will hide this. It will be possible to write the same code for both specifications. This layer will follow OpenGL as closely as possible, so that anything doable in pure OpenGL will be doable using this layer. Classes will include ShaderProgram, Attribute, Texture, VertexBuffer, FrameBuffer, etc., following the corresponding OpenGL classes.
The initialization and rendering passes will also be fully customizable.
Handling shaders is probably the hardest part. Indeed, shaders will be handled modularly. This is necessary notably for generating dynamically Desktop and ES versions of the same shaders. Also, code snippets of shaders should be easily combinable. A simple and lightweight templating system will be implemented separately.
An independent module will handle interactivity in a backend-independent way. It will be written in be pure Python and have absolutely no dependency (no PyQt, wx, etc.). Rather, it will be possible to write small bindings between such GUI systems and that interaction module.
This module will support mouse, keyboard, multi-touch interactivity and will be customizable for other input systems (e.g. Leap Motion).
An important design decision is to choose between a pull and push system. Specifically, a callback function could be called as soon as an event occurs (push), or the current events can be polled during the main loop (pull). I feel like the pull system is better. I used a push system in Galry and it was actually not a good decision. The pull system is closer to what the computer is going, and offers a cleaner interface in my opinion.
This layer is built on top of the OpenGL wrapper. It will provide a way to transform high-level visual objects into actual OpenGL commands, making it simpler to design visualizations. It will be heavily object-oriented.
By definition, a visual is any graphical object that does not require more than a very few rendering OpenGL calls to be drawn. It will typically consists of one or a very small number of homogeneous primitives. Examples include: a curve consisting of successive line segments, a graph consisting of nodes and edges linking pairs of nodes, an image, a 3D mesh object, etc.
A visual comes with several GL objects (vertex buffers, shaders, textures, etc.) and can implement its own render function. It has an unique name in the scene, and can refer to other visuals and visuals' objects easily, i.e. it is possible to share objects between visuals.
A visual can also implement actions, i.e. methods that change the rendering process. These actions can then be called by the interaction system.
A VisualManager object takes all the visuals in input, and implements the actual initialization and rendering functions. There's a default visual manager which just calls all visuals' render functions in some order, but it can be customized if needed. A more sophisticated system can be implemented (like a scene graph) but this is up to the developer, and the library will not impose such system.
Finally, the high-level interface will be a glue layer, linking the different parts together (namely, the visual and the interaction systems). Visuals can be statically or dynamically created, and react to interaction events.
Some parts of the code might be used for an OpenGL backend for Matplotlib. Such a backend would not be redundant with this new library as the goals are really not the same. Matplotlib is currently designed to generate figures, whereas vispy is about interactive visualization.
At least, the glwrap layer could be used as is in the Matplotlib backend. The visuals layer might be used too.
Here is a proposition for the high-level code structure.
Sub-packages should be as independent as possible.
-
/vispy/glwrap/: the low-level OpenGL wrapper.
-
/vispy/glwrap/shaders(.py): handles the generation of shaders from code snippets, and automatically generates the shader variable declarations.
-
/vispy/glwrap/...
-
/vispy/events/: the interaction system (event manager).
-
/vispy/visuals/: the object-oriented visuals layer.
-
/vispy/utils/: utility functions that can be used anywhere in the project.
-
/vispy/highlevel/: high-level interface.
-
/vispy/backends/: different GUI backends, with a unified interface for creating the window, etc. so that it's easy to switch between glut/qt/etc.
-
/vispy/apps/: external modules based on the lower layers: 2D plotting, 3D rendering, etc.
-
/examples/: code examples, both as scripts using the mlab-like high-level interface, and as GUI widgets with a more object-oriented approach.
-
/tutorials/: maybe to put in examples/?
-
/docs/: the documentation in Markdown or ReST, probably using Sphinx.
We should force ourselves to write as many unit tests as possible and target a high code coverage. Ideally we should be write unit tests while designing the code, before implementing it. It can also help us designing modular code, where different modules are as independent as possible.
In practice, each sub-package will have a tests
folder containing test_xxx.py
functions. I propose to use nose
as a unit testing framework.
There should not be a /vispy/tests/
folder.
It may not be obvious to do unit testing in some parts of the code, especially those related to OpenGL rendering and interaction system. But we should take the time to design a testing framework for those cases.
-
Unit tests for glwrap: we could have a DEBUG mode where each call to a core OpenGL function is logged somewhere, and the unit tests aim at ensuring that glwrap object-oriented code is correctly translated into the expected GL calls.
-
We could also have more functional tests, where simple figures are drawn using different glwrap objects. Then, these figures are automatically compared to the expected figures. That's what is done in Galry, where the same centered, white square is rendered using different features (lines, points, textures, etc.).
-
For the visuals layer, we could ensure that simple scripts written with this layer result in the correct glwrap calls and objects.
-
For the interaction layer, we could have a system that emulates a user (e.g. the mouse moves linearly from X to Y) so that we can check that the event system acts as expected.
This may seem cumbersome to implement, but I cannot stress enough how important it is to have a good test coverage. The code may work well without tests at a given point, but subsequent improvements and refactoring patches really often introduce new bugs.
Travis CI looks really nice for automatically testing the code at every commit. Unit tests and linters can run automatically, and we receive an e-mail a few minutes after a commit if it introduced a bug! And it's really easy to interface with a GitHub project.
We should use this service from the beginning.
We should follow strict conventions to target high-quality code. In particular, sticking to pep8, using a linter, writing docstrings for all classes and methods, carefully chosing variable and function names, etc.
We should make it a custom that any code included in vispy is at least reviewed by one (other) core member.
We should also take our time to get the (API) design of the different components figured out really well.
The code should be compatible with both Python 2.x and Python 3.x, without the use of 2to3. The six module is a good help to deal with for instance byte/unicode; we can copy six.py to vispy/utils
.
The only case where a 2.x/3.x compatible codebase is problematic (in my experience (Almar)), is catching exception objects. The way that has worked for me in other projects is:
except Exception:
err_type, err_val, err_tb = sys.exc_info(); del err_tb
print(err_val) # For instance
We should target Python 2.6 and up, and Python 3.2 and up.
We should use the issues system as much as possible!
Here are some deadlines:
- Google Summer of Code: maybe some work will be done on the MPL OpenGL backend?
- EuroSciPy 2013: sprint.
So here are some suggestions:
- The glwrap layer should be well advanced by the end of June (design, implementation, unit tests, examples...), so that any work on the backend could use this module.
- The interaction system is an almost completely standalone module, that could be done independently, and which has a lower priority.
- The whole code architecture (specification of the modules, API, classes, functions) should be ready for EuroSciPy. We could make some progress on the implementation during the sprint, and we can continue the work this Fall.
- We could have a beta version by the end of this year.