Skip to content

Latest commit

 

History

History

gl_400

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

OpenGL 4.0 Highlights (g-truc review)

  • two buffer uniform arrays
  • OpenGL 400 capabilities
  • GL_ARB_draw_indirect provides new draw call functions (glDrawArraysIndirect and glDrawElementsIndirect) and a new buffer binding point called GL_DRAW_INDIRECT_BUFFER. They behave the same way than glDrawArraysInstancedBasedVertex and glDrawElementsInstancedBasedVertex except that the parameters are read from a buffer binded at GL_DRAW_INDIRECT_BUFFER point. This buffer could be generated by transform feedback or another APIs (OpenCL / Cuda) which avoid an undesired read back of GPU memory which would stall the rendering pipeline.
  • glDrawElementsIndirect similar to glDrawElementsInstancedBaseVertexBaseInstance but take parameters from a bound GL_DRAW_INDIRECT_BUFFER buffer containing a DrawElementsIndirectCommand command.
  • you can switch between the two draw calls in order to see how they are equivalent
  • attaches a 4-layer texture to the GL_COLOR_ATTACHMENT0 of an fbo and render in each of them by instances in the geometry shader. Then it binds the texture and splash each of them to screen by selecting the i-th layer via uniform.
  • OpenGL 4.0 Geometry shader provides streams and also a great improvement of this programmable stage: Geometry instancing. Where others OpenGL instancing techniques execute the entire graphics pipeline for each instance, this functionality allows to run multiple times the geometry shader, each run being identified by gl_InvocationID. The number of time the geometry shader is invoked is indicated inside the geometry shader using the new input layout qualifier. Geometry shader input layout qualifier:
    layout(triangles, invocations = 7) in; 

The first parameter in the input layout is the input primitive type which can be points, lines, lines_adjacency, triangles and triangles_adjacency. Geometry shader also provides new required output layout qualifiers. Geometry shader input layout qualifier:

    layout(triangle_strip, max_vertices = 76, stream = 0) out; 

This layout defines the geometry shader output primitive points, line_strip or triangle_strip and the maximum number of vertices the shader will ever emit in a single invocation. The maximum value is gl_MaxGeometryOutputVertices. stream declares the default stream and can be different to 0 only when the output primitive is points. stream number can be declare in the global scoop, for a block or a non-block output variable. Vertrices and primitives are emited to specific streams using the GLSL functions EmitStreamVertex and EndStreamPrimitive

  • With GL_ARB_sample_shading, the programmer can force the minimum number of samples that will be compute independently. To be efficient, most implementation share some values between samples like texture coordinates so that a texture fetch can be reused for every samples of a fragment. For example, in case of alpha test based on alpha texture, this behaviour can introduce aliasing. A problem quite obvious in Crysis.

The function glMinSampleShading is used to set this minimum number of samples. In GLSL, it gives us several built-in variables: in int gl_SampleID the number of the sample, a value between 0 and gl_NumSamples - 1 uniform int gl_NumSamples is the total number of samples in the framebuffer; in vec2 gl_SamplePosition the position of the sample in the pixel (between 0.0 and 1.0 where 0.5 is the pixel center); out int gl_SampleMask[] is used to changed the coverage of a sample, to exclude some samples from further fragment processing but it will never enable uncovered samples.

  • generates one multisample fbo with a 4-samples texture and one simple fbo with a texture of the same dimension. Before rendering, it enables shading for each sample with glEnable(GL_MULTISAMPLE), glEnable(GL_SAMPLE_SHADING) and glMinSampleShading(1.0f). It renders the diffuse texture to the multisample fbo, blits the content to the other fbo and then render the final result to screen.
  • glMinSampleShading(1f) is super sampling (SSAA), related
  • small tip, if you mix texture and renderbuffer with multisampling, you must use fixed sample location for textures..
  • given a interpolateAtSample bug in the nvidia glsl compiler I had to find a trick and I was told to use sample identifier instead in the fragment shader to get the sample coordinates.
  • issue 7
  • allocates three empty textures and attaches each of them to the first three color attachments of an fbo. Then it clears them with a different color and render them to the screen each in a different corner
  • same but using texture array
  • sampler2DArray
  • GL_ARB_texture_gather provides an equivalent to the Direct3D 10.1 gather4 instruction to fetch 4 texels components from 4 different texel in one call for soft shadow and some post processing effects.
  • textureGather
  • render to a shadow map (depth texture) and use it to render the shadow in the next step
  • primitive instancing with geometry shader, layout(triangles, invocations = 6) in
  • primitive smooth shading in comparison, tessellation on left vs interpolated values on right
  • OpenGL 4.0 brings 3 new processing stages that take place between the vertex shader and geometry shader.

    Control shader (Known as Hull shader in Direct3D 11)

    Primitive generator

    Evaluation shader (Known as Domain shader in Direct3D 11)

In a way, the tessellation stages replace the vertex shader stage in the graphics pipeline. Most of the vertex shader tasks will be dispatched in the control shader and the evaluation shader. So far, the vertex shader stage is still required but the control shader and the evaluation shader are both optional.

Control shaders work on 'patches', a set of vertices. Their output per-vertex data and per-patch data used by the primitive generator and available as read only in the evaluation shader.

Input per-vertex data are stored in an array called gl_in which maximum size is gl_MaxPatchVertives. The elements of gl_in contain the variables gl_Position, gl_PointSize, gl_ClipDistance and gl_ClipVertex. The per-patch variables are gl_PatchVerticesIn (number of vertices in the patch), gl_PrimitiveID (number of primitives of the draw call) and gl_InvocationID (Invocation number).

The control shaders have a gl_out array of per-vertex data which members are gl_Position, gl_PointSize and gl_ClipDistance. They also output per-patch data with the variables gl_TessLevelOuter and gl_TesslevelInner to control the tessellation level.

A control shader is invoked several times, one by vertex of a patch and each invocation is identified by gl_InvocationID. These invocations can be synchronized by the built-in function barrier.

The primitive generator consumes a patch and produces a set of points, lines or triangles. Each vertex generated are associated with (u, v, w) or (u, v) position available in the evaluation shader thanks to the variable gl_TessCoord where u + v + w = 1.

The evaluation shaders provide a gl_In array like control shaders. The members of the elements of gl_In are gl_Position, gl_PointSize and gl_ClipDistance for each vertex of a patch. The evaluation shaders have the variables gl_PatchVerticesIn and gl_PrimitivesID but also some extra variables gl_TessLevelOuter and gl_TessLevelInner which contain the tessellation levels of the patch.

The evaluation shaders output gl_Position, gl_PointSize and gl_ClipDistance.

Tessellation has a lot more details to understand to work on a real implementation in a project! Those details are available in GL_ARB_tessellation_shader and obviously in OpenGL 4.0 specification.

  • Subroutines are defined by GL_ARB_shader_subroutine as part of OpenGL 4.0 specification. This mechanism is some sort of C++ function pointer which allows to select, from the C++ program, a specific algorithm to be used in a GLSL program. This feature is a great enhancement for the uber-shader type of software design where all the algorithms are included in a single shader to handle multiple/every cases. Subroutines allow to select specific shader code-paths but also to keep the same program and program environment.

The following GLSL code sample defines 3 subroutine uniforms, which means 3 entries to change a shader behaviour. Several functions can be defined for a subroutine and a single subroutine function can be used for multiple subroutine uniforms. Subroutine function can't be overloaded. Subroutine uniforms are the sort of function pointer but can only point on subroutine functions.

Subroutine in GLSL 4.00:

    subroutine vec4 greatFeature(in vec3 Var1, in vec3 Var2);
    subroutine vec4 bestFeature(in vec3 Var1, in vec3 Var2);
    subroutine mat4 otherFeature(in vec4 Var1, in float Var2, in int var3);
    subroutine(greatFeature, bestFeature)
    vec4 myFeature1(in vec3 Var1, in vec3 Var2)
    { ... } // Required function body
    subroutine(greatFeature, bestFeature)
    vec4 myFeature2(in vec3 Var1, in vec3 Var2)
    { ... } // Required function body
    subroutine(bestFeature)
    vec4 myBestFeature(in vec3 Var1, in vec3 Var2)
    { ... } // Required function body
    subroutine(otherFeature)
    subroutine mat4 myOtherFeature(in vec4 Var1, in float Var2, in int var3);
    { ... } // Required function body
    // Could be set to myFeature1, myFeature2
    subroutine uniform greatFeature GreatFeature;
    // Could be set to myFeature1, myFeature2, myBestFeature
    subroutine uniform bestFeature BestFeature;
    // Could be set to myOtherFeature only...
    // probably not a recommanded use of subroutines...
    subroutine uniform otherFeature OtherFeature;
    void main()
    {
    // Subroutine uniform variables are called the same way functions are called.
    GreatFeature();
    ...
    BestFeature();
    ...
    OtherFeature();
    } 

The subroutine uniforms are assigned using the function glUniformSubroutinesuiv which parameters define the list of the subroutine functions used set to all subroutine uniforms. To get the subroutine function locations, OpenGL provides the function glGetSubroutineIndex.

  • varying color with blocks
  • loads a diffuse texture twice with right (rgba) and inverted (brga) swizzle and set the layer with an uniform uniformDiffuseIndex variable
  • GL_NV_gpu_shader5
  • similar but different declaration uniform sampler2D diffuse[2] instead uniform sampler2DArray diffuse[2]
  • and access texture(diffuse[index], inVert.texCoord) instead texture(diffuse[index], vec3(texCoord, layer))
  • GL_ARB_texture_query_lod. This extension allows to get the LOD that would have been used for a texture fetch. This would make possible a per fragment LOD, like we could choose a lighting algorithm more or less accurate according this LOD value... With such feature, we can perform a per-fragment adaptive texture filtering. "Anisotropic filtering 16x" is no longer a meaningful concept.
  • textureQueryLOD
  • texelFetch(sampler*, ivec3 coord, int level)
  • trinilinearLod (GL_LINEAR_MIPMAPS_LINEAR) shader implementation

GL_ARB_texture_buffer_object_rgb32 trivially adds 3 channels 32 bits texture buffers: GL_RGB32I, GL_RGB32UI and GL_RGB32F.

GL_ARB_texture_compression_bptc provides Direct3D 11 compressed formats known as BC6H and BC7 and called respectivelly GL_BPTC_FLOAT and GL_BPTC with OpenGL. They aim high dynamic range, low dynamic range texture compression and high quality compression of sharp edges. The compression ratio for GL_BPTC_FLOAT and GL_BPTC are 6:1 and 3:1.

  • samplerBuffer
  • loads position offsets and diffuse color in texture buffers
  • GL_ARB_gpu_shader5 provides further per-sample controls regarding how in/out data are interpolated using qualifiers. When centroid is used to qualify a variable, a single value can be assigned to that variable for all the samples in the pixel. However, when sample qualify a variable, a separate value must be assigned to that variable for each covered sampled.

New built-in interpolation functions interpolateAtCentroid, interpolateAtSample and interpolateAtOffset are available to compute interpolated value of a fragment shader input variable. interpolateAtCentroid will return the value of a variable a centroid location, interpolateAtSample at sample location and interpolateAtOffset at an offset location from the pixel center where (0, 0) is the center of the pixel. If an input variable is declared with the qualifier noperspective, the interpolation is computed without perspective correction.

  • GL_ARB_transform_feedback2 defines 3 features. First, it creates a transform feedback object (sometime called XBO) that encapsulates the transform feedback states... Well, that is to say the transform feedback buffers which with GL_INTERLEAVED_ATTRIBS is just 1 buffer... what's the point?!

This object allows to pause (glPauseTransformFeedback) and resume (glResumeTransformFeedback) transform feedback capture. XBO manages a behaviour state. This way, multiple transform feedback objects can record the vertex attributes, one after the other but never at the same time. In an OpenGL command flow, we can imagine that some draw calls belong to one transform feedback and others belong to a second transform feedback.

Finally, this extension provides the function glDrawTransformFeedback to use transform feedback buffers as vertex shader source without having to query the primitives written count. When querying this count with glGetQueryObjectuiv, the function is going to stall the graphics pipeline waiting for the OpenGL commands to be completed. glDrawTransformFeedback replaces glDrawArrays in this case and doesn't need the vertices count, it's going to use automatically the count of written primitives in the transform feedback object to draw. GL_ARB_transform_feedback2 is part of OpenGL 4.0 but is also supported by GeForce GT200 as an extension.

GL_ARB_transform_feedback3 defines 2 features. First, with OpenGL 3.0 when we capture varying we are limited by 2 dispatched methods: GL_SEPARATE_ATTRIBS to write a varying per buffer and GL_INTERLEAVED_ATTRIBS to write all the varyings in a single buffer.

GL_ARB_transform_feedback3 proposes a much more realistic scenario: it allows to write interleaved varyings in several buffers. Let's take an example. A transform feedback object could contains 3 buffers. The first buffer could capture 1 varying. The second buffer could capture 3 varying and the third one could capture 2 varyings. This behaviour is defined with a simple very approach: in the name list given to glTransformFeedbackVaryings, we insert the name gl_NextBuffer as a separator between buffer.

Also, this extension has some interactions with GL_ARB_gpu_shader5 which defines multiple vertex streams in geometry shaders. Multiple vertex streams is a new concept for OpenGL 4.0. In a way, before OpenGL 4.0 we had a single vertex streams which was use by the rasterizer. The first vertex stream is still used by the rasterizer but the others can be output to transform feedback objects. Such possibility requires an extra set of functions to query the written primitives per stream and to be able to draw directly using a specific vertex stream. This is done with glDrawTransformFeedBackStream, glBeginQueryIndexed, glEndQueryIndexed and glGetQueryIndexediv.

  • use the transform feedback to transform a vec4 position into a vec4 position and vec4 color.
  • glEnable(GL_RASTERIZER_DISCARD)
  • glBindTransformFeedback(GL_TRANSFORM_FEEDBACK, feedbackName[0])
  • glBeginTransformFeedback(GL_TRIANGLES)
  • glDrawTransformFeedback, no more primitive number! No more stalling queries! Cool
  • GL_INTERLEAVED_ATTRIBS
  • same but using explicit stream instead
  • glDrawTransformFeedbackStream(GL_TRIANGLES, feedbackName[0], 0) is equivalent to glDrawTransformFeedback where stream 0 is implicit