Releases: ConfettiFX/The-Forge
Release 1.58 - June 17th, 2024 Behemoth | Compute-Driven Mega Particle System | Triangle Visibility Buffer 2.0
Release 1.58 - June 17th, 2024 Behemoth | Compute-Driven Mega Particle System | Triangle Visibility Buffer 2.0 |
Announce trailer for Behemoth
We helped Skydance Interactive to optimize Behemoth last year. Click on the image below to see the announce trailer:
Compute-Based Mega Particle System
This unit test was based on some of our research into software rasterization and GPU-driven rendering. A particle system completely running in very few compute shaders with one large buffer holding most of the data. Like with all things GPU-Driven, the trick is to execute one compute shader once on one buffer to reduce read / write memory bandwidth. Although this is not new wisdom, you will be surprised how many particle systems get this still wrong ... having compute shaders for each stage of the particle life time or even worse doing most of the particle work on the CPU.
This particle system was demoed last year in a few talks in September on a Samsung S22. Here are the slides:
http://www.conffx.com/WolfgangEngelParticleSystem.pptx
It is meant to be used to implement next-gen Mega Particle systems in which we simulate always 100000th or millions of particles at once instead of the few dozen ones contemporary systems simulate.
Android Samsung S22 1170x540 resolution
This screenshot shows 4 million firefly-like particles, with 10000 lights attached to them and a shadow for the directional light. Those numbers were thought to be not possible on mobile phones before.
Android Samsung S23 1170x540 resolution
Same setting as above but this time also with 8 Shadows from Point Lights additionally.
Android Samsung S24 1170x540 resolution
Same setting as above but this time also with 8 Shadows from Point Lights additionally.
PS5 running at 4K
Windows with AMD RX 6400 at 1080p
Triangle Visibility Buffer 2.0
we have the new compute based TVB 2.0 approach now running on all platforms (on Android only S22). You can download slides from the I3D talk from
Release 1.57 - May 8th, 2024 Visibility Buffer 2.0 Prototype | Visibility Buffer 1.0 One Draw call
Release 1.57 - May 8th, 2024 Visibility Buffer 2.0 Prototype | Visibility Buffer 1.0 One Draw call
Visibility Buffer Research - I3D talk
We are giving a talk about our latest Visibility Buffer research on I3D. Here is a short primer what it is about:
The original idea of the Triangle Visibility Buffer is based on an article by [[burns2013]. [schied15] and [schied16] extended what was described in the original article. Christoph Schied implemented a modern version with an early version of OpenGL (supporting MultiDrawIndirect) into The Forge rendering framework in September 2015.
We ported this code to all platforms and simplified and extended it in the following years by adding a triangle filtering stage following [chajdas] and [wihlidal17] and a new way of shading.
Our on-going improvements simplified the approach incrementally and the architecture started to resemble what was described in the original article by [burns2013] again, leveraging the modern tools of the newer graphics APIs.
In contrast to [burns2013], the actual storage of triangles in our implementation of a Visibility Buffer happens due to the triangle removal and draw compaction step with an optimal “massaged” data set.
By having removed overdraw in the Visibility Buffer and Depth Buffer, we run a shading approach that shades everything with one regular draw call. We called the shading stage Forward++ due to its resemblance to forward shading and its usage of a tiled light list for applying many lights. It was a step up from Forward+ that requires numerous draw calls.
We described all this in several talks at game industry conferences, for example on GDCE 2016 [engel16] and during XFest 2018, showing considerable performance gains due to reduced memory bandwidth compared to traditional G-buffer based rendering architectures.
A blog post that was updated over the years for what we call now Triangle Visibility Buffer 1.0 (TVB 1.0) can be found here [engel18].
Over the last years we extended this original idea with a Order-Independent Transparency approach (it is more efficient to sort triangle IDs in a per-pixel linked list compared to storing layers of a G-Buffer), software VRS and then we developed a Visibility Buffer approach that doesn't require draw calls to fill the depth and Visibility Buffer and one that requires much less draw calls in parallel.
This release offers -what we call- an updated Triangle Visibility Buffer 1.0 (TVB 1.0) and a prototype for the Triangle Visibility Buffer 2.0 (TVB 2.0).
The changes to TVB 1.0 are evolutionary. We used to map each mesh to an indirect draw element. This reuqired the use of DrawID to map back to the per-mesh data. When working on a game engine with a very high amount of draw calls, it imposed a limitation on the number of "draws" we could do, due to having only a limited number of bits available in the VB.
Additionally, instancing was implemented using a separate instanced draw for each instanced mesh. We refactored the data flow between the draws and the shade pass.
There is now no reliance on DrawID and instances are handled transparently using the same unified draw. This both simplifies the flow of data and allows us to draw more "instanced" meshes.
Apart from being able to use a very high-number of draw calls, the performance didn't change.
The new TVB 2.0 approach is revolutionary in a sense that it doesn't use draw calls anymore to fill the depth and visibility buffer. There are two compute shader invocations that filter triangles and eventually fill the depth and visibility buffer.
Not using draw calls anymore, makes the whole code base more consistent and less convoluted -compared to TVB 1.0-.
You can find now the new Visibilty Buffer 2 approach in
The-Forge\Examples_3\Visibility_Buffer2
This is still in an early stage of development. We only support a limited number of platforms: Windows D3D12, PS4/5, XBOX, and macOS / iOS.
Sanitized initRenderer
we cleaned up the whole initRenderer code. Merged GPUConfig into GraphicsConfig and unified naming.
Metal run-time improvements
We improved the Metal Validation Support.
Art
Everything related to Art assets is now in the Art folder.
Bug fixes
Lots of fixes everywhere.
References:
[burns2013] Christopher A. Burns, Warren A. Hunt, "The Visibility Buffer: A Cache-Friendly Approach to Deferred Shading", 2013, Journal of Computer Graphics Techniques (JCGT) 2:2, Pages 55 - 69.
[schied2015] Christoph Schied, Carsten Dachsbacher, "Deferred Attribute Interpolation for Memory-Efficient Deferred Shading" , Kit Publication Website: http://cg.ivd.kit.edu/publications/2015/dais/DAIS.pdf
[schied16] Christoph Schied, Carsten Dachsbacher, "Deferred Attribute Interpolation Shading", 2016, GPU Pro 7, Pages
[chajdas] Matthaeus Chajdas, GeometryFX, 2016, AMD Developer Website http://gpuopen.com/gaming-product/geometryfx/
[wihlidal17] Graham Wihlidal, "Optimizing the Graphics Pipeline with Compute", 2017, GPU Zen 1, Pages 277--320
[engel16] Wolfgang Engel, "4K Rendering Breakthrough: The Filtered and Culled Visibility Buffer", 2016, GDC Vault: https://www.gdcvault.com/play/1023792/4K-Rendering-Breakthrough-The-Filtered
[engel18] Wolfgang Engel, "Triangle Visibility Buffer", 2018, Wolfgang Engel's Diary of a Graphics Programmer Blog http://diaryofagraphicsprogrammer.blogspot.com/2018/03/triangle-visibility-buffer.html
Release 1.56 - April 4th, 2024 I3D | Warzone Mobile | Visibility Buffer | Aura on macOS | Ephemeris on Switch | GPU breadcrumbs | Swappy in Android | Screen-space Shadows | Metal Debug Markers improved
Release 1.56 - April 4th, 2024 I3D | Warzone Mobile | Visibility Buffer | Aura on macOS | Ephemeris on Switch | GPU breadcrumbs | Swappy in Android | Screen-space Shadows | Metal Debug Markers improved
I3D
We are sponsoring I3D again. Come by and say hi! We also will be giving a talk on the new development around Triangle Visibility Buffer.
Warzone Mobile launched
We work on Warzone Mobile since August 2020. The game launched on March 21, 2024.
Visibility Buffer
We removed CPU cluster culling and simplified the animation data usage. Now traingle filtering only takes one dispatch each frame again.
Swappy frame pacer is now vailable in Android/Vulkan
We integrated the Swappy frame pacer into the Android / Vulkan eco system.
GPUCfg system improved with more ids and less string compares
we did another pass on the GPUCfg system and now we can generate the vendor Ids and model Ids with a python script to keep the *_gpu.data list easily up to date for each platform.
We removed most of the name comparisons and replaced them with the id comparisons which should speed up parsing time and is more specific.
Screen-Space Shadows in UT9
We added to the number of shadow approaches in that unit test screen-space shadows. These are complementary to regular shadow mapping and add more detail. We also fixed a number of inconsistencies with the other shadow map approaches.
PS5 - Screen-Space Shadows off
GPU breadcrumbs on all platforms
Now you can have GPU crash reports on all platforms. We skipped OpenGL ES and DX11 so ...
A simple example of a crash report is this:
2024-04-04 23:44:08 [MainThread ] 09a_HybridRaytracing.cp:1685 ERR| [Breadcrumb] Simulating a GPU crash situation (RAYTRACE SHADOWS)...
2024-04-04 23:44:10 [MainThread ] 09a_HybridRaytracing.cp:2428 INFO| Last rendering step (approx): Raytrace Shadows, crashed frame: 2
We will extend the reporting a bit more over time.
Ephemeris now also runs on Switch ...
Release 1.55 - March 1st, 2024 - Ephemeris | gpu.data | Many bug fixes and smaller improvements
Release 1.55 - March 1st, 2024 - Ephemeris | gpu.data | Many bug fixes and smaller improvements
Ephemeris 2.0 Update
We improved Ephemeris again and support it now on more platforms. Updating some of the algorithms used and adding more features.
Now we are supporting PC, XBOX'es, PS4/5, Android, Steamdeck, iOS (requires iPhone 11 or higher (so far not Switch)
IGraphics.h
We changed the graphics interface for cmdBindRenderTargets
// old
DECLARE_RENDERER_FUNCTION(void, cmdBindRenderTargets, Cmd* pCmd, uint32_t renderTargetCount, RenderTarget** ppRenderTargets, RenderTarget* pDepthStencil, const LoadActionsDesc* loadActions, uint32_t* pColorArraySlices, uint32_t* pColorMipSlices, uint32_t depthArraySlice, uint32_t depthMipSlice)
// new
DECLARE_RENDERER_FUNCTION(void, cmdBindRenderTargets, Cmd* pCmd, const BindRenderTargetsDesc* pDesc)
Instead of a long list of parameters we now provide a struct that gives us enough flexibility to pack more functionality in there.
Variable Rate Shading
We added Variable Rate Shading to the Visibility Buffer OIT example test 15a. This way we have a better looking test scene with St. Miguel.
VRS allows rendering parts of the render target at different resolution based on the auto-generated VRS map, thus achieving higher performance with minimal quality loss. It is inspired by Michael Drobot's SIGGRAPH 2020 talk: https://docs.google.com/presentation/d/1WlntBELCK47vKyOTYI_h_fZahf6LabxS/edit?usp=drive_link&ouid=108042338473354174059&rtpof=true&sd=true
The key idea behind the software-based approach is to render everything in 4xMS targets and use a stencil buffer as a VRS map. VRS map is automatically generated based on the local image gradients.
It could be used on a way wider range of platforms and devices than the hardware-based approach since the hardware VRS support is broken or not supported on many platforms. Because this software approach utilizes 2x2 tiles we could also achieve higher image quality compared to hardware-based VRS.
Shading rate view based on the color per 2x2 pixel quad:
- White – 1 sample (top left, always shaded);
- Blue – 2 horizontal samples;
- Red – 2 vertical samples;
- Green – all 4 samples;
Debug Output with the original Image on PC
Debug Output with the original Image on PC
Debug Output with the original Image on Android
Debug Output with the original Image on Android
UI description:
- Toggle VRS – enable/disable VRS
- Draw Cubes – enable/disable dynamic objects in the scene
- Toggle Debug View – shows auto-generated VRS map if VRS is enabled
- Blur kernel Size – change blur kernel size of the blur applied to the background image to highlight performance benefits of the solution by making fragment shader heavy enough.
Limitations:
Relies on programmable sample locations support – not widely supported on Android devices.
Supported platforms:
PS4, PS5, all XBOXes, Nintendo Switch, Android (Galaxy S23 and higher), Windows(Vulkan/DX12), macOS/iOS.
gpu.data
You want to check out those files. They are now dedicated per supported platform. So it is easier for us to differ between different Playstations, XBOX'es, Switches, Android, iOS etc..
Unlinked Multi GPU
The Unlinked Multi GPU example was broken on AMD 7x GPUs with Vulkan. This looks like a bug. We had to disable DCC to make that work.
Vulkan
we track GPU memory now and will extend this to other platforms.
Vulkan mobile support
We support now the VK_EXT_ASTC_DECODE_MODE_EXTENSION_NAME extension
Remote UI
Various bug fixes to make this more stable. Still alpha ... will crash.
Retired:
- 35 Variable Rate Shading ... this went into the Visibility Buffer OIT example 15a.
- Basis Library - after not having found any practical usage case, we remove Basis again.
Release 1.54 - February 2nd, 2024 - Remote UI Control | Shader Server | Visibility Buffer | Asset Pipeline | GPU Config System | macOS/iOS | Lots more ...
Release 1.54 - February 2nd, 2024 - Remote UI Control | Shader Server | Visibility Buffer | Asset Pipeline | GPU Config System | macOS/iOS | Lots more ...
Our last release was in October 2022. We were so busy that we lost track of time. In March 2023 we planned to make the next release. We started testing and fixing and improving code up until today. The amount of improvements coming back from the -most of the time- 8 - 10 projects we are working on where so many, it was hard to integrate all this, test it and then maintain it. To a certain degree our business has higher priority than making GitHub releases but we realize that letting a lot of time pass makes it substantially harder for us to get the whole code base back in shape, even with a company size of nearly 40 graphics programmers. So we cut down functional or unit tests, so that we have less variables. We also restructured large parts of our code base so that it is easier to maintain. One of the constant maintenance challenges were the macOS / iOS run-time (More about that below).
We invested a lot in our testing environment. We have more consoles now for testing and we also have a much needed screenshot testing system. We outsource testing to external service providers more. We removed Linux as a stand-alone target but the native Steamdeck support should make up for this.
We tried to be conservative about increasing API versions because we know on many platforms our target group will use older OS or API implementations. Nevertheless we were more adventurous this year then before. So we bumped up with a larger step than in previous years.
Our next release is planned for in about four weeks time. We still have work to do to bring up a few source code parts but now the increments are much smaller.
In the meantime some of the games we worked on, or are still working on, shipped:
Forza Motorsport has launched in the meantime:
Starfield has launched:
No Man Sky has launched on macOS:
Internal automated testing setup on our internal GitLab server
- Our automated testing setup that tests all the platforms now takes 38 minutes for one run. At some point it was more. We revamped this substantially since the last release adding now screenshot comparisons and a few extra steps for static code analysis.
Visibility Buffer
- the Visibility Buffer went through a lot of upgrades since October 2022. I think the most notable ones are:
- Refactored the whole code so that it is easier to re-use in all our examples, there is now a dedicated Visibility Buffer directory holding this code
- Animation of characters is now integrated
- Tangent and Bi-Tangent calculation is moved to the pixel shader and we removed the buffers
Software Variable Rate Shading
This Unit test represents software-based variable rate shading (VRS) technique that allows rendering parts of the render target at different resolution based on the auto-generated VRS map, thus achieving higher performance with minimal quality loss. It is inspired by Michael Drobot's SIGGRAPH 2020 talk: https://docs.google.com/presentation/d/1WlntBELCK47vKyOTYI_h_fZahf6LabxS/edit?usp=drive_link&ouid=108042338473354174059&rtpof=true&sd=true
The key idea behind the software-based approach is to render everything in 4xMS targets and use a stencil buffer as a VRS map. The VRS map is automatically generated based on the local image gradients.
The advantage of this approach is that it runs on a wider range of platforms and devices than the hardware-based approach since the hardware VRS support is broken or not supported on many platforms. Because this software approach utilizes 2x2 tiles we can also achieve higher image quality compared to hardware-based VRS.
Shading rate view based on the color per 2x2 pixel quad:
- White – 1 sample (top left, always shaded);
- Blue – 2 horizontal samples;
- Red – 2 vertical samples;
- Green – all 4 samples;
UI description:
- Toggle VRS – enable/disable VRS
- Draw Cubes – enable/disable dynamic objects in the scene
- Toggle Debug View – shows auto-generated VRS map if VRS is enabled
- Blur kernel Size – change blur kernel size of the blur applied to the background image to highlight performance benefits of the solution by making fragment shader heavy enough.
Limitations:
Relies on programmable sample locations support – not widely supported on Android devices.
Supported platforms:
PS4, PS5, all XBOXes, Nintendo Switch, Android (Galaxy S23 and higher), Windows(Vulkan/DX12).
Implemented on MacOS/IOS, but doesn’t give expected performance benefits due to the issue with stencil testing on that platform
Shader Server
To enable re-compilation of shaders during run-time we implemented a cross-platform shader server that allows to recompile shaders by pressing CTRL-S or a button in a dedicated menu.
You can find the documentation in the Wiki in the FSL section.
Remote UI Control
When working remotely, on mobile or console it can cumbersome to control the development UI.
We added a remote control application in Common_3\Tools\UIRemoteControl which allows control of all UI elements on all platforms.
It works as follows:
- Build and Launch the Remote Control App located in Common_3/Tools/UIRemoteControl
- When a unit test is started on the target application (i.e. consoles), it starts listening for connections on a part (8889 by default)
- In the Remote Control App, enter the target ip address and click connect
This is alpha software so expect it to crash ...
VK_EXT_device_fault support
This extension allows developers to query for additional information on GPU faults which may have caused device loss, and to generate binary crash dumps.
Ray Queries in Ray Tracing
We switched to Ray Queries for the common Ray Tracing APIs on all the platforms we support. The current Ray Tracing APIs increase the amount of memory necessary substantially, decrease performance and can't add much visually because the whole game has to run with lower resolution, lower texture resolution and lower graphics quality (to make up for this, upscalers were introduced that add new issues to the final image).
Because Ray Tracing became a Marketing term valuable to GPU manufacturers, some game developers support now Ray Tracing to help increase hardware sales. So we are going with the flow here by offering those APIs.
iPhone 11 (Model A2111) at resolution 896x414
We do not have a denoiser for the Path Tracer.
GPU Configuration System
This is a cross-platform system that can track GPU capabilities on all platforms and switch on and off features of a game for different platforms. To read a lot more about this follow the link below.
New macOS / iOS run-time
We think the Metal API is a well balanced Graphics API that walks the path between low-level and high-level very well. We ran into one general problem with the Metal API for both platforms. It is hard to maintain the code base. There is an architectural problem that was probably introduced due to lack in experience in shipping games.
In essence what Apple decided to do is have calls like this:
https://developer.apple.com/documentation/swift/marking-api-availability-in-objective-c
Anything a hardware vendor describes as available and working might not be working with the next upgrade of the operating system, hardware or just the API or XCode.
If you have a few hundred of those macros in your code, it becomes a lottery what works and what not on a variety of hardware. On some hardware one is broken, on the other hardware something else.
So there are two ways to deal with this: for every @available macro you start adding a #define to switch off or replace that code based on the underlying hardware and software platform. You would have to manually track if what the macro says is true on a wide range of platforms with different outcome.
So for example on macOS 10.13 running on a certain Macbook Pro (I make this up) with an Intel GPU it is broken but then a very similar Macbook Pro that has additionally a dedicated GPU actually runs it. Now you have to track what "class of Macbook Pro" we are talking about and if the Macbook Pro in question has an Intel or an AMD GPU.
We track all this data already so that is not a problem. We know exactly what piece of hardware we are looking at (see above GPU Config system).
The problem is that we have to guard every @available macro with som...
Release 1.53 - October 5th, 2022 - Steamdeck Support | App life cycle changes | Shader Byte Code Offline Generation | GTAO Unit Test | Improved gradient calculation in Visibility Buffer | New C Containers | Reorg TF Directory Structure | Upgraded to newer ImGUI | The Forge Blog
Release 1.53 - October 5th, 2022 - Steamdeck Support | App life cycle changes | Shader Byte Code Offline Generation | GTAO Unit Test | Improved gradient calculation in Visibility Buffer | New C Containers | Reorg TF Directory Structure | Upgraded to newer ImGUI | The Forge Blog
The Starfield Official Gameplay Reveal Trailer is out. It always brings us pleasure to see The Forge running in AAA games like this:
We added The Forge to the Creation Engine in 2019.
The Forge made an appearance during the Apple developer conference 2022. We added it to the game "No Man's Sky" from Hello Games to bring this game up on macOS / iOS. For the Youtube video click on the image below and jump to 1:22:40
-
We switched our Linux OS to Manjaro to have an easier upgrade path to the Steamdeck. Please note the changed Linux requirements below.
-
Shader byte code can now be generated offline.
- Shader binaries are compiled through FSL
- Introduced ShaderList files that determine all the binary shaders that FSL needs to produce. Defines, shader target and other specific configuration can be specified per shader binary declaration
- Update all projects (UT, VB, Aura, Ephemeris) to use the new ShaderLists
- Remove all ShaderStageLoadDesc::pMacros, shaders are compiled offline through ShaderLists
- Remove all Renderer::pBuiltinShaderDefines, all configuration is done through FSL
-
Over the last few projects we had always challenges with EASTL. So over the last 9 months we slowly removed it and replaced it by new C language based containers that prefer stack allocations over heap allocations.
There is a new unit test that helps us to test the new libraries.
For string management:
bstrlib
For dynamic arrays and hash tables:
stb_ds.h
There is a new unit test to make sure those new containers are tested. It is called 36_AlgorithmsAndContainers
-
We changed the App life cycle: modern APIs have so many ways to reset the driver or reload assets, so we made a more flexible "reload" mechanism that generalizes all the special cases we had in there before.
- App extended with reload functionality by making use of ReloadDesc* parameter for the Load/Unload functions
- define reload/reset descriptors structs
- define reload/reset enum types
- Updated OS base files regarding new structs
- Able to reload shaders on all examples
This is a breaking change to all of our rendering interfaces.
-
New Animation test that unifies most of the former animation tests into one. This way we can save some testing time in our Jenkins setup.
-
We added a new unit test called 38_AmbientOcclusion_GTAO. It implements the paper "Practical Real-Time Strategies for Accurate Indirect Occlusion" by Jorge Jimenez et. all.
-
We improved the gradient calculation in the Visibility Buffer. Thanks to Stephen Hill @self_shadow who brought this to our attention.
-
We reorganized the whole TF directory structure to allow development in more areas. Here is an image representing the new structure:
What is still missing is the "Render Abstraction Layer", "Scene Loader" and we have to populate the "Game Layer" more.
-
We upgraded to ImGUI 1.88 to get access to the docking feature. In the process we improved the ImGUI integration substantially.
-
We started a blog for The Forge at The-Forge-Blog. We have no idea where we can find the time to write blog posts ... let's see what is happening ...
-
Retired Unit/Functional Tests:
- 08_GltfViewer - generally glTF is not a model format that is applicable for game development. So we use it as an intermediate format in the Resource loader. In the future we might only use it in the offline asset pipline. The main idea is to extract the data and bring it into a form that is usable in games. Unfortunately many people thought that the glTF viewer is a good model to start with. So we want to guide them in the right diretion here by not offering direct access to a glTF reader anymore.
- Most of the animation unit tests are now merged into 21_Animations, to reduce our hardware testing time. Our Jenkins testing environment that tests all platforms before someone can merge code is taking too long.
Release 1.52 - April 29th, 2022 - C Code Hot Reloading Unit Test | Visibility Buffer OIT | Pre-Computed DLUT Test | Unified Window and Resolution control | Android Vulkan Validation Layer | CPU Features | Upgraded Vulkan and DX GPU allocator | macOS / iOS improvements | Double precision Math Library | Impoved Input System with HID support
Release 1.52 - April 29th, 2022 - C Code Hot Reloading Unit Test | Visibility Buffer OIT | Pre-Computed DLUT Test | Unified Window and Resolution control | Android Vulkan Validation Layer | CPU Features | Upgraded Vulkan and DX GPU allocator | macOS / iOS improvements | Double precision Math Library | Impoved Input System with HID support
We are always looking for more graphics / engine programmers. We are also specifically looking for a consultant who can help us to scale up our hardware testing environment.
The following list of changes is not fully representative of all the improvements we made, so it is just a selection:
- C Code Hot Reloading Unit Test - This unit test showcases an implementation of code hot reloading in C, we've used and adapted the following GitHub library
for this.
The test contains two projects:
- 19_CodeHotReload_Main: generates the executable. All code in this project can't be hot-reloaded. This is the project you should set as startup project when running the program form an IDE.
- 19a_CodeHotReload_Game: for development platforms Windows/MacOS/Linux generates a dynamic library that is loaded by the Main project in runtime, when the dynamic library changes the Main program reloads the new code. For Android/IOS/Quest/Consoles this project is compiled and linked statically.
How to use it: While the Main project is running open 19_CodeHotReload_Game.cpp and perform some change, there are lines marked with TRY_CODE_RELOAD
to make easy changes. Once the file is saved, you can rebuild the project and see the changes happen automatically.
- Windows/Linux: Click on the UI "RebuildGame" button.
- MacOS: Command+B on XCode to rebuild.
Note: In this implementation we can't call any functions from The Forge from the HotReloadable project (19a_CodeHotReload_Game), this is because we are compiling OS and Renderer as static libraries and linking them directly to the exe. Ideally these projects should be compiled as dynamic libraries in order to expose their functionality to the exe and hot reloadable dll. The reason we didn't implement it in this way is because all our other projects are already setup to use static libraries.
- Visibility Buffer Order-Independent Transparency - we added OIT by utilizing a per-pixel linked list to a Visibility Buffer (VB) rendering architecture. In case of Deferred Shading (DS), the per-pixel linked list holds per-pixel data. In case of VB it only holds the triangle index data. You can switch between DS and VB in this example. The VB version occupies substantially less memory and is faster. With memory bandwidth being the biggest challenge in graphics programming, this is not unexpected. Most people by now adopted the idea of VB in one or two ways but it doesn't hurt to show another advantage of the architecture.
XBOX One (original) 1080p resolution
- Pre-Computed DLUT Test - this test implements pre-computing volume transmittance in Blender or Houdini for 6 directions and shading clouds/smoke based on the following tweets:
https://twitter.com/Vuthric/status/1286796950214307840
A detailed description can be found here: https://realtimevfx.com/t/smoke-lighting-and-texture-re-usability-in-skull-bones/5339
In this repository is a "dlut.blend" file that contains a minimal volumetric render setup. In order to generate DLUT image do the following steps:
- Set the viewport shading to "Rendered"
- Select the "Sun" object
- Set the X rotation to 0 degrees
- Press F12 to render the image and wait for a few minutes until it's done
- Save the rendered image to "dlut_0.png"
- Repeat steps 3-5 for 90, 180 and 270 degrees and save "dlut_90.png", "dlut_180.png" and "dlut_270.png"
- Run the "combine_dlut.py" Python script or manually combine rendered images in your image editor of choice, each color channel should contain the red channel from the corresponding "dlut_*.png" image multiplied by the alpha channel of the same image. For example, green channel should contain the red channel from "dlut_90.png" multiplied by the alpha channel of "dlut_90.png"
- Experiment and implement further ideas from the article above. Setting up a Mantaflow simulation in Blender and exporting animated smoke and simulation attributes like temperature can yield interesting results!
Resulting DLUT image should look like this:
The example program running on Android:
-
Window Management - all the platforms that support the concept of having a windowed application have now a base file named {Platform}Window.cpp. There is now a common UI element that offers -if supported- multi-monitor support and various window settings. There are also LUA scripts that test the functionality in our Jenkins setup.
-
Android Vulkan Validation layers: we added the validation layer from Khronos GitHub repo as they have stopped shipping the layer in the NDK.
Android Vulkan Validation Layers
You can find them in ThirdParty/OpenSource/AndroidVulkanValidationLayers
- CPU / GPU Features - we integrated the following library to test CPU features during start-up. Now you will see a lot more information about the CPU in the upper left corner of a window.
This library is the stepping stone of utilizing more CPU instrinsics on various platforms. You can see its results in the screenshots above, showing the name of the CPU, the supported instruction set. We also show now the GPU name and the driver version that the GPU uses.
-
Upgraded Vulkan and DX Allocators: following the updates to these open-source libraries on GitHub we upgraded our code base accordingly.
-
macOS / iOS - while working with TF on various projects, we bring back improvements and lessons learned from those projects. You will find numerous macOS / iOS improvements in this release.
-
For one of the business applications we worked on, we needed double precision Math. We extended the math library now accordingly with support.
-
We also improved the input system with HID support, which is an on-going effort. So better controller support on more platforms ...
- Windows 7 - better Windows 7 support with DX11 and Vulkan ... still a bug in the Vulkan run-time with sRGB ...
- We upgraded the 06_MaterialPlayground with shadows:
-
Retired unit test: we are going to retire many unit tests now because our automated testing cycle takes too long and heats up the "engine" room (see above passage on us looking for an consultant to scale up our testing environment). Today we retire:
- 02_Compute
- 05_FontRendering
- 13_UserInterface - we might create a much more advanced one for tools development in the future
- 16a_SphereTracing
- 32_Windows - not necessary anymore with every unit test now offering windows management
-
Resolved GitHub Issues:
V1.51
Release 1.51 - December 21st, 2021 - ECS uses flecs | Better Borderless Window | Descriptor Management improvements | sRGB | Android Game Development Extensions | FSL Improvements | Ray Tracing | Meshoptimizer | Buildbox | Lethis
Happy Holidays! 🎄🎅🔥🎁🧨
We wish you and your loved ones all the best for the Holidays and a Happy New Year 2022!
This update is again a mixture of things we learned while integrating The Forge and feedback and contributions from our users. Thanks for all the support!
In one of the next updates we will remove EASTL and offer dedicated containers compatible with C99. Over time EASTL was a huge productivity burner. The inefficient memory access patterns hugged too much CPU time in games where we integrated TF and we always had to go back and fix those later manually.
We know this is a breaking change but considering that STL was a good idea on CPUs 20+ years ago, we would like to align more with what modern CPUs are expecting.
- We keep moving towards C99 usage. We replaced the old ECS code with Flecs:
Now our build times are much better and the overall system runs faster:
CPU: Intel i7-7700k
GPU: AMD Radeon RX570
Old ECS
Debug
Single Threaded: 90.0ms
Multi Threaded 29.0ms
Release:
Single Threaded: 5.7ms
Multi Threaded: 2.3ms
flecs
Debug
Single Threaded: 23.0ms
Multi Threaded 6.8ms
Release
Single Threaded 1.7ms
Multi Threaded 0.9ms
-
Descriptor Management improvements - we changed the rendering interface for all platforms - cmdBindPushConstants now takes an index instead of a name, we also allow partial updates of array descriptors
-
Borderless window - there are improvements to borderless windows support
- remove borderless window "top white bar" on Windows OS
- add "Win Key + arrow" behavior (for standard maximize/minimize/split of the borderless window using keyboard)
- top resize area necessarily overlaps rendering (this is how we can remove the top white bar)
-
FSL improvements
- incremental shader generation and build with header dependencies
- improved error reported by extending line directives, errors now show up at the correct line in source fsl file
- extended matrix column/row access functions for all targets
- vec type padding to match our math lib datatypes
-
sRGB - all examples should be now more correct when it comes to linear lighting and sRGB
-
Android Game Extension usage: in one of our AAA game engine projects, we are now using successfully for the first time Android Game Extensions. So we also brought it to The Forge.
We redid a lot of the Android development setup to streamline the experience a bit. We still use Visual Studio 2017 because it allows to be more productive compared to other IDEs. Two years ago we went back and forth between IDEs but concluded that the only IDE that we could efficiently integrate into our Jenkins testing setup was Visual Studio.
Please check out the new Readme below and let us know if we missed anything.
-
Vulkan: moved to KHR ray tracing extensions. Upgraded max spec to Vulkan SDK 1.2.162 and tested ray tracing support on an AMD RX 6700 XT GPU
-
meshoptimizer - somehow the integration of meshoptimizer "got lost" over time and we just re-integrated it into the resource loader. Here are some numbers that we got
meshoptimizer on various art assets
- Buildbox
The game engine BuildBox is now using The Forge (click on Image to go to Buildbox website):
- Lethis
The game Lethis Path of Progress uses The Forge now (click on image to go to the Steam Store)
Release 1.50 - October 13, 2021 - M²H uses The Forge | Unlinked Multi GPU Support | Central Config.h | glTF Viewer improvements | Scalar High Precision Math
- M²H uses The Forge for Stroke Therapy - M²H is a medical technology company. They developed a physics-based video game therapy solution that is backed by leading edge neuroscience, powered by Artificial Intelligence and controlled by dynamic movement – all working in concert to stimulate vast improvement of cognitive and motor functions for patients with stroke and the aged.
The Forge provides the rendering layer for their application.
Here is a YouTube video on what they do:
-
Unlinked multiple GPU Support: for professional visualization applications, we now support unlinked multiple GPU.
A new renderer API is added to enumerate available GPUs.
Renderer creation is extended to allow explicit GPU selection using the enumerated GPU list.
Multiple Renderers can be created this way.
The resource loader interface has been extended to support multiple Renderers.
It is initialized with the list of all Renderers created.
To select which Renderer (GPU) resources are loaded on, the NodeIndex used in linked GPU configurations is reused for the same purpose.
Resources cannot be shared on multiple Renderers however, resources must be duplicated explicitly if needed.
To retrieve generated content from one GPU to another (e.g. for presentation), a new resource loader operation is provided to schedule a transfer from a texture to a buffer. The target buffer should be mappable.
This operation requires proper synchronization with the rendering work; a semaphore can be provided to the copy operation for that purpose.
Available with Vulkan and D3D12.
For other APIs, the enumeration API will not create a RendererContext which indicates lack of unlinked multi GPU support. -
Config.h: We now have a central config.h file that can be used to configure TF.
- Created config files:
Common_3/OS/Core/Config.h
Common_3/Renderer/RendererConfig.h
Common_3/Renderer/{RenderingAPI}/{RenderingAPI}Config.h
* Modified PyBuild.py
* Proper handling of config options.
* Every config option has --{option-name}/--no-{option-name} flag that uses define/undef directives to enable/disable macros.
* Macros are guarded with ifndef/ifdef.
* Updated Android platform handling
* Deleted Common_3/Renderer/Compiler.h. It's functionality was moved into Config.h
* Moved all macro options to config files
* Renamed USE_{OPTON_NAME} to ENABLE_{OPTION_NAME}
* Changed some macros to be defined/not defined instead of having values of 0 or 1.
* Deleted all DISABLE_{OPTION_NAME} macros
* When detecting raytracing replaced ENABLE_RAYTRACING with RAYTRACING_AVAILABLE. This was done, because not all projects need raytracing even if it is available. RendererConfig.h defines ENABLE_RAYTRACING macro if it is available. So, it can be commented out in singular place instead of searching for it for every platform
* Removed most of the macro definitions from build systems. Some of the remaining macros are:
* Target platform macros: NX64, QUEST_VR
* Arm neon macro ANDROID_ARM_NEON.
* Windows suppression macros(like _CRT_SECURE_NO_WARNINGS)
* Macros specific to gainputstatic
- glTF viewer improvements:
- sRGB fixes
- IBL support now with prefiltered CCO/public domain cube maps
- TAA support on more platforms and fixes
- Vignette support
glTF Viewer running on Android Galaxy Note 9
glTF Viewer running on iPhone 7
glTF Viewer running on Linux with NVIDIA RTX 2060
glTF Viewer running on Mac Mini M1
glTF Viewer running on XBOX One Original
- Specialization/Function constants support on Vulkan and Metal only - these constants get baked into the micro-code during pipeline creation time so the performance is identical to using a macro without any of the downsides of macros (too many shader variations increasing the size of the build).
Good read on Specialization constants. Same things apply to function constants on Metal
Declared at global scope using SHADER_CONSTANT macro. Used as any regular variable after declaration
Macro arguments:
#define SHADER_CONSTANT(INDEX, TYPE, NAME, VALUE)
Example usage:
SHADER_CONSTANT(0, uint, gRenderMode, 0);
// Vulkan - layout (constant_id = 0) const uint gRenderMode = 0;
// Metal - constant uint gRenderMode [[function_constant(0)]];
// Others - const uint gRenderMode = 0;
void main()
{
// Can be used like regular variables in shader code
if (gRenderMode == 1)
{
//
}
}
- Resolved GitHub Issues
- #206 - Executing Unit Tests on Mac OS 10.14 gives a Bad Access error
- #209 - way to read texture back from GPU to CPU - this functionality is now in the resource loader
- #210 - memory allocation challenge - not an issue
- #212 - Question: updating partial uniform data on OpenGLES backend - not possible with OpenGL ES 2.0 run-time
- #219 - Question : way to support Vulkan SpecializationInfo? - support is now in the code base see above
Release 1.49 - September 09, 2021 - Quest 2 Support | Apple M1 support ng | MSAA | OpenGL ES 2 Update | PVS Studio
- Quest 2 Support - after working now for the last 4 years on various Quest projects, we decided to add Quest 2 support to our framework.
- At this moment the following unit tests do not work:
- 07_Tessellation: Tesselation is not supported when using Multiview. Unit test has been removed from Quest solution file.
- 10_ScreenSpaceReflections: Lots of artifacts.
- 14_WaveIntrinsics: Wave intrinsics are not supported.
Quest 2 running 01_Transformations
Quest 2 running 09_ShadowPlayground
- Apple M1 support - we are testing now on a M1 iMac and a M1 iPad Pro. Unfortunately we have crashes in one unit test and all the more complex examples and middleware.
- Aura
- 16_raytracing
- Visibility Buffer
- Ephemeris
iMac with M1 chip running at 3840x2160 resolution
iPad with M1 chip running with 1024x1366 resolution
It is astonishing how well the iPad with M1 chip perform.
Due to -what we consider driver bugs- M1 hardware crashes in
- UI / Fonts / Lua interface refactor
- Moved Virtual Joystick to IInput.h / InputSystem.cpp
- Pulled current Lua implementation out of AppUI and gave it its own interface (IScripting.h)
- Pulled Fontstash implementation out of AppUI and gave it its own interface (IFont.h)
- IFont and IScripting are now initialized on the OS Layer, with user customization functions available on the App Layer
- Fonts and Lua can now be disabled via preprocessor defines and UI will still function (using default 'ProggyClean' font)
- Zip unit test refactor to support encryption and writes into archive For one of our customer projects we need password encryption, so we replaced our old zip library with
- Extended iOS Gesture / Android gesture support For the same project we added more gesture support for mobile platforms.
- Partial C99 rewrite of OS/Interfaces headers and implementation files Our on-going effort to make TF easier to use is to rewrite parts in C99, so that teams can work with it more efficiently, the compile time goes down as well as the memory footprint is smaller.
- OpenGL ES 2 - Unit test 17 is now working as well.