Tech. Visuals and scene graph

This document describes the current design and state of the visual / scenegraph system.
See usage examples, which absolutely need to be taken into consideration while designing the visuals and scene graph layers.
See also these notes that contain random thoughts about these layers.

Overview of visual system

The major goal of the visuals system is to implement python classes that represent a specific type of drawable object. These may be as simple as a line, point, or triangle, or they may be more complex objects with specialized APIs, such as 3D surface plot, axes with ticks / text, polar grid lines, etc. Visuals may be used in any OpenGL context, even in the absence of a vispy Canvas (although they do depend on vispy.gloo) or scenegraph. Visuals are highly modular and customizable:

All vertex data may be processed through arbitrary, user-specified transformations. These include both linear (scale, translate, rotate) and nonlinear (log, polar, ...) transformations, in any order and combination.
Fragment colors are determined by a chain of modular components that implement various input functions (constant, varying, texture) and filters (lighting, materials, clipping, AGG, ...)

This modularity depends on a subsystem for combining GLSL shader functions, described below. Visuals may optionally be drawn as a collection, where the data for many visuals is combined to draw all with a single GL call.

Overview of scenegraph

The scenegraph system implements a standard scenegraph, which is a hierarchy of visuals. Each visual in a scenegraph defines only the transformation between its local coordinate system and its parent's coordinate system. The scenegraph then constructs the total transformation for each visual by combining the individual transformations of its parents. The scenegraph is also responsible for:

Drawing all of its visuals in the correct order
Sending mouse and other user input events to visuals (using picking and boundary checking)
Automatically handling issues with the relationship between the canvas and the current glViewport

Keeping these layers generic

Very often, visualization libraries impose a particular way of thinking that is specific to a given application. For instance, 3D game engines implement abstractions like cameras, lights, maybe a scene graph with different rooms, etc. Scientific libraries implement things like axes, 2D coordinate systems (cartesian, polar), plots... Yet, some 3D visualizations may be quite different from a 3D video game (e.g. 3D modeling software, 3D scientific visualization...). Similarly, 2D visualizations might not be easily implemented on top of a scientific library. I think it is important that Vispy stays generic enough at the level of:

GLOO
visuals
if possible, scene graph

One practical consequence of this is that we may want to restrict the vispy.scene package to have only general-use visuals, and implement a separate library of more complex visuals for scientific graphics in a separate package like vispy.plot.

Definitions

We need rigorous definitions in a mathematical language to ensure we are all talking about the same objects.

Scene graph: a weakly connected directed acyclic graph such that any node has 0 or more parents. This graph represents the hierarchy of transforms.
Entity: a node in the scene graph. Conceptually, an Entity represents a coordinate system with an integer $d$ which is the dimensionality of the entity (typically 2 or 3). An Entity has associated with it a single transform that defines the relationship between its coordinate system and the coordinate systems of its parents.
Visual: A subclass of Entity that also has a graphical representation. This includes simple visuals such as lines, points, and triangle meshes, and also includes more complex or compound visuals.
Scene: A branch of a scenegraph, including all sub-branches and leaves it leads to. Typically, 'scene' is used to refer to the branch of a scenegraph that appears inside a ViewBox.
Root: The top-level Entity in a Scene.
Transform: A function $f that maps from the coordinate system of one entity: \R^{d'} \to another \R^{d}$. Most, but not all transforms are invertible.
Composition of transforms: Because each entity defines the transform that maps to its parent coordinate systems, a transform may be constructed that maps between any two entities in a scenegraph. Given two entities $E$ and $E'$, there is a path from one entity to the other in the undirected version of the graph. The composition of the transforms along the path is a new transform (composition of the transform functions, reversed if the edge is in the opposite direction). Something like $f_1 \circ f_2 \circ f_3^{-1}$. This compound transform is probably independent of the specific path taken (thanks to the properties of the graph, notably the acyclic property).
Transform relation: an equivalence relation between entities. $E \equiv E'$ iff the composed transform from $E$ to $E'$ is the identity.
Coordinate system: an equivalence class of the transform relation. All entities in the same equivalence class are in the same coordinate system.
Camera: An entity that is configured such that:
1. Its unit box (-1, -1, -1) - (1, 1, 1) defines a visible region of the scene it lives in
2. Its +Z-axis (if applicable) defines the direction of view, with Z=-1 closest to the observer
3. Its +Y-axis defines the direction "up"
4. Its +X-axis defines the direction "right"
ViewBox: A Visual whose purposes are to: 1) provide a rectangular region to render the scene within the viewbox to; 2) provide a user-definable transformation for rendering the scene within the viewbox (via a camera entity that is inside the viewbox itself); 3) provide clipping when rendering. --- The "scene within the viewbox" is simply the list of its children. As such, the total scenegraph is a complete graph without interruptions (i.e. contiguous). The way that a viewBox renders its scene may depend on the situations. The easiest would be to use glViewport and glScissor. Other options are to use an FBO, or chaining the scenes transformation with the viewbox' own transformations and then using fragment-clipping or a stencil buffer.

Example.

Take this example from PyQtGraph.

There are nine plots. Each plot is an entity, direct child of the root. Each of those nine edges is a linear transformation, composition of an $1/3$-homothety and a translation.

Each plot is made of multiple entities, including abstract (non-visual) entities that contain transformations between different coordinate systems (data coordinates to normalized coordinates).

Visuals

Definition and scope

Visuals form an abstraction layer right above GLOO. Whereas GLOO wraps OpenGL in an object-oriented interface, visuals offer an interface that is closer to what the user expects to see. The Visual API is concerned with "what" to draw, rather than the "how".
Broadly speaking, a visual is simply a visual object appearing on the scene.
Principle 1: a visual is rendered independently from the other visuals. It has no knowledge of the rest of the scene. The inter-visual relations are handled by the scene graph, which is another separate layer.
- But how to share data between visuals in this case? A possibility: assume a visual expects a NumPy array for a property. Normally, during the visual initialization, a VBO is created and this array is uploaded there. To share data, instead of a NumPy array, we pass an existing VBO instead (defined in another visual). No new VBO is created, the existing VBO is used instead.
There are built-in visuals, and user-defined visuals.
The built-in visuals are:
- DiscVisual (a filled or empty disc, with a border or not)
- LineVisual (a line segment, with a width, possible single or double arrow, border style like dashed, dotted...)
- LineStripVisual (like LineVisual, but with a succession of points)
- TextVisual (antialias, choice of font, color...)
- PolygonVisual (filled or empty, border or not, texture or solid color, custom shader...)
- MeshVisual
- PointSprite
- VolumeVisual (volume rendering)
- GraphVisual (nodes = point sprites, edges = line visuals)
- to complete...
and with lower priority:
- BezierCurveVisual
- NurbsSurfaceVisual
- to complete...
There are two ways of rendering multiple visuals of the same type:
- By creating multiple instances of Visual objects and rendering them one by one (slow).
- By creating a Collection that allows for highly efficient batch rendering. For example, a DiscCollection renders a large number of discs quite efficiently: one VBO for all disc properties, one rendering call (glMulti or an ES-compatible alternative).

Technical notes

All visuals are subclasses of the Visual superclass, which is itself a subclass of Entity. Note, however, that even though the primary purpose of Entity is for constructing a scenegraph, it is intended that all visuals can be used in the absence of any scenegraph.
The Visual superclass provides the following features:
- A pair of extensible skeleton shaders:
```
```
    VERTEX_SHADER = """
    // local_position function must return the current vertex position
    // in the Visual's local coordinate system.
    vec4 local_position();
  
    // mapping function that transforms from the Visual's local coordinate
    // system to normalized device coordinates.
    vec4 map_local_to_nd(vec4);
  
    // generic hook for executing code after the vertex position has been set
    void vert_post_hook();
  
    // Global variable storing the results of local_position()
    // Any component may read this variable, but it should be treated as
    // read-only.
    vec4 local_pos;
  
    void main(void) {
        local_pos = local_position();
        vec4 nd_pos = map_local_to_nd(local_pos);
        gl_Position = nd_pos;
        
        vert_post_hook();
    }
    """
  
    FRAGMENT_SHADER = """
    // Fragment shader consists of only a single hook that is usually defined 
    // by a chain of functions, each which sets or modifies the current fragment
    // color, or discards it.
    vec4 frag_color();
  
    void main(void) {
        gl_FragColor = frag_color();
    }
    """
```
```
  These shaders provide hooks for configuring 1) the source of vertex data, 2) the transformation to ND coordinates, 3) a chain of vertex shader functions, and 4) a chain of fragment shader functions. The definitions for these shader hooks are provided by VisualComponent instances attached to each visual.
- A VisualComponent friend class that is the base for all modular components. These implement position and color input functions (uniform, attribute, texture, procedural), materials (lighting, phong shading, reflection..), clipping, and essentially anything else within the scope of visuals except for transforms, which are implemented separately.
- Two default collections of modular components: pos_components affect the output of the vertex shader, whereas color_components affect the output of the fragment shader.
- A transform property that defines the mapping from the local coordinate system to normalized device coordinates. Note that this is NOT the same as the Entity transform, which only maps to the parent coordinate system (the conflict between these has yet to be resolved). This property must be an instance of Transform, or any of its subclasses. Commonly this will be an instance of ChainTransform, which executes a list of transformations in sequence.
- A default set_data() implementation for specifying vertexes, colors, normals, etc. This probably only applies well to a few basic visuals, and exists mostly to encourage Visual subclasses to adopt a similar API.
- A set_gl_options() method that allows the user to override the GL state flags that should be set before drawing this visual.
- A default paint() implementation that activates all components in order, and then calls program.draw() with the drawing mode defined by the primitive property. The vertex_index property is also used to set the index buffer, if used.
The visual's properties include the data describing the visual (NumPy arrays of arbitrary data type, or lists of native Python objects like numbers, tuples, strings...), as well as options influencing how it is rendered. Example:
```
  disc = DiscVisual()
  disc.center = (0., 0.)
  disc.radius = 20.
  disc.color = (1., 0., 0.)
  disc.border = Border(2., style='dashed', color=(0., 0., 1.))
```
The properties should be "intelligent": changing them should trigger the adequate OpenGL commands to update the rendering calls and the underlying OpenGL objects. For example, calling circle.color = (1., 0., 0.) in on_paint() should change the color instantaneously, without forcing the user to call something like updateGL(). This method is called transparently by the visuals layer. One possibility is to use a pure Python traits implementation (see IPython) but that might be overkill.
Visuals do not use initialize() because the modular component system requires that the construction of most objects is deferred until the visual is about to paint.
Properties that are NumPy arrays should be yet more intelligent. For example, imagine a LineStripVisual with a points properties (Npoints x 2 array). Doing mylinestrip.points[:,1] += 1 should instantaneously update the VBO and the visual object in the scene. Such properties would need to be instances of a custom class (e.g. "ArrayTrait") that overrides __setitem__, etc.

Modular shaders

GLSL shader components are combined using a system defined in vispy.scene.shaders. One of the most fundamental problems in combining independent pieces of GLSL code is ensuring that there are no collisions between the names defined in each component. In this system, modular components are defined as GLSL functions in order to ensure that all local variables are properly contained within an exclusive scope. Global variables (uniforms, attributes, varyings) and function names are automatically mangled to ensure uniqueness. To make this possible, shader components use $template style variable names to allow identifiers to be altered. For more information, see examples/modular_shaders/sandbox.py.

Transforms

The Transform subclasses provide a variety of coordinate system transformations such as simple scale+translation, affine matrices, quaternions, logarithmic, and polar. A ChainTransform subclass allows the arbitrary chaining of any number of Transform instances.

A Transform consists of the following features:

GLSL code defining the forward and inverse mapping functions. This code follows the conventions defined by the modular shader system.
Python map() and imap() methods providing the same functionality on the CPU. These methods accept single-vector (3,) or vector-array (N, 3) inputs. Other data types may define _transform_in() and _transform_out() methods that allow them to be passed through the map() and imap() methods of any Transform.
Properties indicating the general behavior of the transform: Linear, Orthogonal, Nonscaling, and Isometric. These flags may be used by visuals to make certain optimizations (for example, primitive subdivision is only needed when using nonlinear transforms).
An inverse() method that returns a new Transform having the inverse effect. This method should be (but is not yet) computationally inexpensive by deferring the inversion until a mapping requested (because we run into situations where a transform is inverted twice before it is used, and the inverse itself was never actually needed).
A __mul__() method that defines the result of composing this transform with another. This allows chains of transforms to be simplified when adjacent transforms are compatible. (see the docstring for more information)

Scene graph

Definition and scope

The scene graph manages the visuals, their positions, the transformations, the rendering order.
Picking -- It knows which object(s) are at any pixel, either by GL picking, or by using bounding geometries (or both).
It distributes mouse and other user-input events to individual visuals in the scene
Might automatically join compatible visuals into collections for more efficient rendering.

Coordinate systems

Besides the coordinate systems defined by each entity in a scenegraph, there are several coordinate systems that must be understood clearly when dealing with the scenegraph:

Normalized device (ND): The coordinate system used as the output of vertex shaders. This system has values (-1 to 1) representing the area defined by calling glViewport().
Document coordinates: The coordinate system used for all physical unit measurements (px, mm, etc.). In a scenegraph, this coordinate system is defined by a Document entity.
Root: The coordinate system of the top-level entity in a scene.
Framebuffer: Physical pixel coordinate system representing the full area of the Canvas. The origin is at the bottom-left corner of the canvas. The arguments to glViewport() are from this coordinate system.
Canvas: Logical pixel coordinate system representing the full area of the Canvas. The origin is at the top-left corner of the canvas.This coordinate system is used for handling mouse events.

Ideas and things to take into consideration

See usage examples.
- Custom aspect ratio
- Arbitrary coordinate system transformation
- Jitter at high zoom/pan level due to single precision floating point issues
Transformations can happen on the CPU and/or on the GPU. The coordinate system should be synchronized between the CPU and the GPU.
Layers like Photoshop (e.g. a 3D scene and a 2D overlay with text and icons) with independent coordinate systems.
The scene graph should implement a method to find which visual is at any pixel (ray picking).
Multiple cameras viewing the same scene

Use cases

Plot line displayed simultaneously in two viewports having linear and log scaling. The line is rendered twice, having a different transform chain for each render. Because the chain is different, the shader must be compiled once per viewport. I would argue that the only solutions to this are to a) disallow the Visual being displayed more than once, or b) allow the Visual to deal with the fact that it is being used in two different contexts.
Axis with ticks in two different viewports. Let's say this is the infinitely long line x=0 drawn in scene coordinates, with horizontal ticks marking equal intervals along the Y-axis. This has some nice features: a) As the user zooms out, the spacing between ticks adjusts such that they never become too dense or too sparse. b) Likewise, the minimum and maximum tick values are determined by mapping the bounding rectangle of the viewport to the coordinate system of the Visual. c) The lines are always drawn 1px wide, and the ticks are always 10 px long, regardless of how the user zooms the viewport. The problems here are numerous, and I hope it is clear that we will run into trouble because this Visual needs to know the shape of both of its viewports, as well as be able to determine how to correctly draw a 10 px long tick in either viewport.
Selection box with handles. In this example, we select an item, which then draws a bounding rectangle around itself to indicate that it has been selected. The rectangle has small squares drawn at its corners which can be dragged to resize the Visual. The handles are to be drawn 10 px wide, regardless of the zoom of the viewport. This example has some very similar issues to (2).

Coordinate systems and transformations

The scene graph maintains a hierarchy of Visuals, with each Visual defining its own 'local' coordinate system. This local coordinate system is defined as the transformation between that Visual and its parent. By joining multiple transformations along links in the hierarchy, it is possible to determine the coordinate transformation between any two Visuals.
Some Visuals may have nothing to draw, but simply exist as a named coordinate system in the scene graph hierarchy.

Interactivity in the scene graph

Some coordinate systems need to be dynamic.
For example, some Visuals will behave as zoom/pannable viewports. These will handle mouse interaction from the user and will determine what part of their local coordinate system is visible. Thus, all children of a viewport will be zoomed/panned together. Viewport will also allow setting aspect ratio constraints, automatically zooming to display their children, and numerous other tasks.
The scene graph receives mouse events from the Canvas that is displaying it, and then forwards these events on to the appropriate Visuals in the scene. A Viewport Visual would then receive these mouse events and update its transform attribute accordingly.
ViewBox Visuals will also implement clipping.

Non-OpenGL representations

The APIs described above are specific to OpenGL, since OpenGL is the main target of Vispy. However, it may be possible, in the future, to "export" a visualization into a different backend (e.g. SVG, PDF, etc.). To see that, consider the scene graph. It contains a hierarchy of visuals, each having an abstract (i.e. not specific to OpenGL) graphical meaning (Disc, Line...), high-level attributes (color, border...) and numerical data. Likewise, Transforms (LinearTransforms, etc.) have a specific graphical meaning. Thus, the scene graph contains all the information required to define a visualization in its entirety.

We could imagine an external backend take this scene graph and compiles an other representation.
Visuals will have the option to change their behavior depending on their output medium. For example, an EllipseVisual may draw a large number of line segments when rendering to OpenGL, but output a simple <ellipse .../> tag when rendering to SVG.

Additional things to think about

Templating system based on jinja2: each shader snippet code can have thinks like {{visual.myproperty}} or {{scenegraph.myproperty}}. Those variables will be automatically set after the full shader code has been compiled from the snippets.
Python interface to add shader snippets at any level (e.g., there's a PolygonVisual, and I want to add GLSL code at the end of the fragment shader to display a fractal with gl_FragColor).
See usage examples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tech. Visuals and scene graph

Overview of visual system

Overview of scenegraph

Keeping these layers generic

Definitions

Example.

Visuals

Definition and scope

Technical notes

Modular shaders

Transforms

Scene graph

Definition and scope

Coordinate systems

Ideas and things to take into consideration

Use cases

Coordinate systems and transformations

Interactivity in the scene graph

Non-OpenGL representations

Additional things to think about

Clone this wiki locally