OpenGL 4.0 Highlights (g-truc review)
- ? Probabily magic
GL_ARB_draw_buffers_blend
extendsGL_EXT_draw_buffers2
with per rendertarget functions and equations.
- two buffer uniform arrays
- OpenGL 400 capabilities
GL_ARB_draw_indirect
provides new draw call functions (glDrawArraysIndirect
andglDrawElementsIndirect
) and a new buffer binding point calledGL_DRAW_INDIRECT_BUFFER
. They behave the same way thanglDrawArraysInstancedBasedVertex
andglDrawElementsInstancedBasedVertex
except that the parameters are read from a buffer binded atGL_DRAW_INDIRECT_BUFFER
point. This buffer could be generated by transform feedback or another APIs (OpenCL / Cuda) which avoid an undesired read back of GPU memory which would stall the rendering pipeline.glDrawElementsIndirect
similar toglDrawElementsInstancedBaseVertexBaseInstance
but take parameters from a boundGL_DRAW_INDIRECT_BUFFER
buffer containing aDrawElementsIndirectCommand
command.- you can switch between the two draw calls in order to see how they are equivalent
- attaches a 4-layer texture to the
GL_COLOR_ATTACHMENT0
of an fbo and render in each of them by instances in the geometry shader. Then it binds the texture and splash each of them to screen by selecting the i-th layer via uniform. - OpenGL 4.0 Geometry shader
provides streams and also a great improvement of this programmable stage: Geometry instancing. Where others
OpenGL instancing techniques execute the entire graphics pipeline for each instance, this functionality
allows to run multiple times the geometry shader, each run being identified by
gl_InvocationID
. The number of time the geometry shader is invoked is indicated inside the geometry shader using the new input layout qualifier. Geometry shader input layout qualifier:
layout(triangles, invocations = 7) in;
The first parameter in the input layout is the input primitive type which can be points
, lines
,
lines_adjacency
, triangles
and triangles_adjacency
. Geometry shader also provides new required output
layout qualifiers. Geometry shader input layout qualifier:
layout(triangle_strip, max_vertices = 76, stream = 0) out;
This layout defines the geometry shader output primitive points
, line_strip
or triangle_strip
and the
maximum number of vertices the shader will ever emit in a single invocation. The maximum value is
gl_MaxGeometryOutputVertices
. stream
declares the default stream and can be different to 0 only when the
output primitive is points. stream number can be declare in the global scoop, for a block or a non-block output variable. Vertrices and primitives are emited to specific streams using the GLSL functions EmitStreamVertex and EndStreamPrimitive
- With
GL_ARB_sample_shading
, the programmer can force the minimum number of samples that will be compute independently. To be efficient, most implementation share some values between samples like texture coordinates so that a texture fetch can be reused for every samples of a fragment. For example, in case of alpha test based on alpha texture, this behaviour can introduce aliasing. A problem quite obvious in Crysis.
The function glMinSampleShading
is used to set this minimum number of samples. In GLSL, it gives us several
built-in variables: in int gl_SampleID
the number of the sample, a value between 0 and gl_NumSamples - 1
uniform int gl_NumSamples
is the total number of samples in the framebuffer; in vec2 gl_SamplePosition the position of the sample in the pixel (between 0.0 and 1.0 where 0.5 is the pixel center); out int gl_SampleMask[] is used to changed the coverage of a sample, to exclude some samples from further fragment processing but it will never enable uncovered samples.
- generates one multisample fbo with a 4-samples texture and one simple fbo with a texture of the same
dimension. Before rendering, it enables shading for each sample with
glEnable(GL_MULTISAMPLE)
,glEnable(GL_SAMPLE_SHADING)
andglMinSampleShading(1.0f)
. It renders the diffuse texture to the multisample fbo, blits the content to the other fbo and then render the final result to screen. glMinSampleShading(1f)
is super sampling (SSAA), related- small tip, if you mix texture and renderbuffer with multisampling, you must use fixed sample location for textures..
- given a
interpolateAtSample
bug in the nvidia glsl compiler I had to find a trick and I was told to usesample
identifier instead in the fragment shader to get the sample coordinates. - issue 7
- allocates three empty textures and attaches each of them to the first three color attachments of an fbo. Then it clears them with a different color and render them to the screen each in a different corner
- same but using texture array
sampler2DArray
GL_ARB_texture_gather
provides an equivalent to the Direct3D 10.1 gather4 instruction to fetch 4 texels components from 4 different texel in one call for soft shadow and some post processing effects.textureGather
- render to a shadow map (depth texture) and use it to render the shadow in the next step
- primitive instancing with geometry shader,
layout(triangles, invocations = 6) in
- primitive smooth shading in comparison, tessellation on left vs interpolated values on right
-
OpenGL 4.0 brings 3 new processing stages that take place between the vertex shader and geometry shader.
Control shader (Known as Hull shader in Direct3D 11)
Primitive generator
Evaluation shader (Known as Domain shader in Direct3D 11)
In a way, the tessellation stages replace the vertex shader stage in the graphics pipeline. Most of the vertex shader tasks will be dispatched in the control shader and the evaluation shader. So far, the vertex shader stage is still required but the control shader and the evaluation shader are both optional.
Control shaders work on 'patches', a set of vertices. Their output per-vertex data and per-patch data used by the primitive generator and available as read only in the evaluation shader.
Input per-vertex data are stored in an array called gl_in
which maximum size is gl_MaxPatchVertives
. The
elements of gl_in
contain the variables gl_Position
, gl_PointSize
, gl_ClipDistance
and gl_ClipVertex
.
The per-patch variables are gl_PatchVerticesIn
(number of vertices in the patch), gl_PrimitiveID
(number
of primitives of the draw call) and gl_InvocationID
(Invocation number).
The control shaders have a gl_out
array of per-vertex data which members are gl_Position
, gl_PointSize
and gl_ClipDistance
. They also output per-patch data with the variables gl_TessLevelOuter
and
gl_TesslevelInner
to control the tessellation level.
A control shader is invoked several times, one by vertex of a patch and each invocation is identified by
gl_InvocationID
. These invocations can be synchronized by the built-in function barrier.
The primitive generator consumes a patch and produces a set of points, lines or triangles. Each vertex
generated are associated with (u, v, w) or (u, v) position available in the evaluation shader thanks to
the variable gl_TessCoord
where u + v + w = 1
.
The evaluation shaders provide a gl_In
array like control shaders. The members of the elements of gl_In
are gl_Position
, gl_PointSize
and gl_ClipDistance
for each vertex of a patch. The evaluation shaders
have the variables gl_PatchVerticesIn
and gl_PrimitivesID
but also some extra variables
gl_TessLevelOuter
and gl_TessLevelInner
which contain the tessellation levels of the patch.
The evaluation shaders output gl_Position
, gl_PointSize
and gl_ClipDistance
.
Tessellation has a lot more details to understand to work on a real implementation in a project! Those
details are available in GL_ARB_tessellation_shader
and obviously in OpenGL 4.0 specification.
GL_ARB_gpu_shader_fp64
is part OpenGL 4.0 and exposes support of FP64 uniforms and FP64 computations in GLSL.double
matrices and uniformsdmat4
dvec4
- Subroutines are defined by
GL_ARB_shader_subroutine
as part of OpenGL 4.0 specification. This mechanism is some sort of C++ function pointer which allows to select, from the C++ program, a specific algorithm to be used in a GLSL program. This feature is a great enhancement for the uber-shader type of software design where all the algorithms are included in a single shader to handle multiple/every cases. Subroutines allow to select specific shader code-paths but also to keep the same program and program environment.
The following GLSL code sample defines 3 subroutine uniforms, which means 3 entries to change a shader behaviour. Several functions can be defined for a subroutine and a single subroutine function can be used for multiple subroutine uniforms. Subroutine function can't be overloaded. Subroutine uniforms are the sort of function pointer but can only point on subroutine functions.
Subroutine in GLSL 4.00:
subroutine vec4 greatFeature(in vec3 Var1, in vec3 Var2);
subroutine vec4 bestFeature(in vec3 Var1, in vec3 Var2);
subroutine mat4 otherFeature(in vec4 Var1, in float Var2, in int var3);
subroutine(greatFeature, bestFeature)
vec4 myFeature1(in vec3 Var1, in vec3 Var2)
{ ... } // Required function body
subroutine(greatFeature, bestFeature)
vec4 myFeature2(in vec3 Var1, in vec3 Var2)
{ ... } // Required function body
subroutine(bestFeature)
vec4 myBestFeature(in vec3 Var1, in vec3 Var2)
{ ... } // Required function body
subroutine(otherFeature)
subroutine mat4 myOtherFeature(in vec4 Var1, in float Var2, in int var3);
{ ... } // Required function body
// Could be set to myFeature1, myFeature2
subroutine uniform greatFeature GreatFeature;
// Could be set to myFeature1, myFeature2, myBestFeature
subroutine uniform bestFeature BestFeature;
// Could be set to myOtherFeature only...
// probably not a recommanded use of subroutines...
subroutine uniform otherFeature OtherFeature;
void main()
{
// Subroutine uniform variables are called the same way functions are called.
GreatFeature();
...
BestFeature();
...
OtherFeature();
}
The subroutine uniforms are assigned using the function glUniformSubroutinesuiv
which parameters define
the list of the subroutine functions used set to all subroutine uniforms. To get the subroutine function
locations, OpenGL provides the function glGetSubroutineIndex
.
- varying color with blocks
- loads a diffuse texture twice with right (rgba) and inverted (brga) swizzle and set the layer with an uniform
uniformDiffuseIndex
variable
GL_NV_gpu_shader5
- similar but different declaration
uniform sampler2D diffuse[2]
insteaduniform sampler2DArray diffuse[2]
- and access
texture(diffuse[index], inVert.texCoord)
insteadtexture(diffuse[index], vec3(texCoord, layer))
GL_ARB_texture_query_lod
. This extension allows to get the LOD that would have been used for a texture fetch. This would make possible a per fragment LOD, like we could choose a lighting algorithm more or less accurate according this LOD value... With such feature, we can perform a per-fragment adaptive texture filtering. "Anisotropic filtering 16x" is no longer a meaningful concept.textureQueryLOD
texelFetch(sampler*, ivec3 coord, int level)
- trinilinearLod (
GL_LINEAR_MIPMAPS_LINEAR
) shader implementation
- Almost all OpenGL release implies more texture formats. This is again the case with OpenGL 4.0
as 2 new extensions on that side has been released.
GL_ARB_texture_buffer_object_rgb32
also part of OpenGL 4.0 specification andGL_ARB_texture_compression_bptc
excluded from OpenGL 4.0 specification, probably just because of an S3 patent just likeGL_EXT_texture_compression_s3tc
GL_ARB_texture_buffer_object_rgb32
trivially adds 3 channels 32 bits texture buffers: GL_RGB32I
,
GL_RGB32UI
and GL_RGB32F
.
GL_ARB_texture_compression_bptc
provides Direct3D 11 compressed formats known as BC6H
and BC7
and
called respectivelly GL_BPTC_FLOAT
and GL_BPTC
with OpenGL. They aim high dynamic range, low dynamic
range texture compression and high quality compression of sharp edges. The compression ratio for
GL_BPTC_FLOAT
and GL_BPTC
are 6:1 and 3:1.
samplerBuffer
- loads position offsets and diffuse color in texture buffers
GL_ARB_texture_cube_map_array
extended texture array to cube maps- loads a texture cube as a 1-layer
GL_TEXTURE_CUBE_MAP_ARRAY
samplerCubeArray
GL_ARB_gpu_shader5
provides further per-sample controls regarding how in/out data are interpolated using qualifiers. Whencentroid
is used to qualify a variable, a single value can be assigned to that variable for all the samples in the pixel. However, whensample
qualify a variable, a separate value must be assigned to that variable for each covered sampled.
New built-in interpolation functions interpolateAtCentroid
, interpolateAtSample
and interpolateAtOffset
are available to compute interpolated value of a fragment shader input variable. interpolateAtCentroid
will
return the value of a variable a centroid location, interpolateAtSample
at sample location and
interpolateAtOffset
at an offset location from the pixel center where (0, 0) is the center of the pixel.
If an input variable is declared with the qualifier noperspective
, the interpolation is computed without
perspective correction.
interpolateAtOffset
bug in the nvidia glsl compiler- issue 7
GL_ARB_transform_feedback2
defines 3 features. First, it creates a transform feedback object (sometime calledXBO
) that encapsulates the transform feedback states... Well, that is to say the transform feedback buffers which withGL_INTERLEAVED_ATTRIBS
is just 1 buffer... what's the point?!
This object allows to pause (glPauseTransformFeedback
) and resume (glResumeTransformFeedback
) transform
feedback capture. XBO
manages a behaviour state. This way, multiple transform feedback objects can
record the vertex attributes, one after the other but never at the same time. In an OpenGL command flow, we
can imagine that some draw calls belong to one transform feedback and others belong to a second transform
feedback.
Finally, this extension provides the function glDrawTransformFeedback
to use transform feedback buffers as
vertex shader source without having to query the primitives written count. When querying this count with
glGetQueryObjectuiv
, the function is going to stall the graphics pipeline waiting for the OpenGL commands
to be completed. glDrawTransformFeedback
replaces glDrawArrays
in this case and doesn't need the
vertices count, it's going to use automatically the count of written primitives in the transform feedback
object to draw. GL_ARB_transform_feedback2
is part of OpenGL 4.0
but is also supported by GeForce GT200 as an extension.
GL_ARB_transform_feedback3
defines 2
features. First, with OpenGL 3.0
when we capture varying we are limited by 2 dispatched methods: GL_SEPARATE_ATTRIBS
to write a varying per
buffer and GL_INTERLEAVED_ATTRIBS
to write all the varyings in a single buffer.
GL_ARB_transform_feedback3
proposes a much more realistic scenario: it allows to write interleaved
varyings in several buffers. Let's take an example. A transform feedback object could contains 3 buffers.
The first buffer could capture 1 varying. The second buffer could capture 3 varying and the third one could
capture 2 varyings. This behaviour is defined with a simple very approach: in the name list given to
glTransformFeedbackVaryings
, we insert the name gl_NextBuffer
as a separator between buffer.
Also, this extension has some interactions with GL_ARB_gpu_shader5
which defines multiple vertex streams in geometry shaders. Multiple vertex streams is a new concept for
OpenGL 4.0. In a way, before
OpenGL 4.0 we had a single vertex streams which was use by the rasterizer. The first vertex stream is still
used by the rasterizer but the others can be output to transform feedback objects. Such possibility requires
an extra set of functions to query the written primitives per stream and to be able to draw directly using a
specific vertex stream. This is done with glDrawTransformFeedBackStream
, glBeginQueryIndexed
,
glEndQueryIndexed
and glGetQueryIndexediv
.
- use the transform feedback to transform a
vec4 position
into avec4 position
andvec4 color
. glEnable(GL_RASTERIZER_DISCARD)
glBindTransformFeedback(GL_TRANSFORM_FEEDBACK, feedbackName[0])
glBeginTransformFeedback(GL_TRIANGLES)
glDrawTransformFeedback
, no more primitive number! No more stalling queries! CoolGL_INTERLEAVED_ATTRIBS
- same but using explicit stream instead
glDrawTransformFeedbackStream(GL_TRIANGLES, feedbackName[0], 0)
is equivalent toglDrawTransformFeedback
where stream 0 is implicit