AI in Your Graphics Card: How DLSS Works

The current generation of videogame development is seeing another major step in the power and proficiency of graphical rendering. As games are being built targeting the likes of the Playstation 5, and the Xbox Series X, not to mention the suite of modern graphics cards by the likes of AMD and Nvidia, there is an expectation that games can aim towards 4K resolutions while maintaining smooth performance. Not to mention the likes of real-time raytracing becoming a reality, completely upending the status quo for lighting environments in video games.

But this graphical arms race puts considerable pressure on consumers to have the right target hardware. You might have picked up that 4K TV or monitor, but you still need the graphics card, processor and memory to get the game performing as desired. But lest you can put down cash for the top of the line hardware – and sometimes even if you have – players and developers have to consider whether you want to play with the highest graphical settings possible, or experience the game at a slightly lower resolution while maintaining smooth framerate and performance.

So let’s take a look at DLSS, a rendering technique bespoke to Nvidia GPU’s, that utilises machine learning to help boost your games graphics and overall performance. We’ll explore how it works and discuss some of the benefits and challenges that emerge in making this a reality.

What is DLSS?

DLSS is an abbreviation for the technique called Deep Learning Super Sampling. A process that takes an original graphic output and attempts to render it at a higher level of fidelity. In order to understand how it works, you need to know a little bit about graphics rendering. Video games like any serious graphical process on a computer or similar device are working towards rendering the contents of a scene or environment modelled in an engine to the screen. If we consider a 3D game, movie or any other simulation, each object in that environment has its own position and rotation relative to the centre or ‘origin’ of the world. Now if we want to visualise that object, we need to have a camera that casts a projection of the world. The idea is that we will be able to see everything within a certain field of view of the camera across a specific distance and will want to render that object so that the user can actually see it.


Now the larger the field of view and distance it renders, the more stuff you can see, but the costs to memory and processing power also increase given you’re handling more information even farther away. Hence why classic 3D games often had a tight field of view, as well as fog to minimise pop-in of distant objects. Meanwhile, most game engines utilise object occlusion to avoid rendering an object if we know it won’t be within the cameras view frustum.

Now with all that done, you next need to render the scene. While the game world might be in three dimensions, your monitor is a two-dimensional surface. What you see on the screen is ultimately a 2D image of a 3D environment. Hence the renderer is having to calculate what is known as the projection matrix: which helps transform the 3D world into a 2D image. That 2D image is going to be based on a specific resolution: which indicates the total number of pixels it will render out based on the width and the height.

The higher the resolution, the increased number of pixels per inch or PPI. This of course requires more memory and processing power as resolutions increase given more pixels are rendered on each frame of the game. This is what resulted in the proliferation of dedicated graphics processing units or GPUs in consumer PCs back in the 1990s. When games are executed on a PC or console, they separate operations best suited for the CPU (the central processing unit) and the GPU. The CPU is best suited for core game logic while the GPU is – as you might expect – better suited for high-resolution graphics and video rendering. But of course, each GPU still has its own limitations.


In fact, this is why ray-tracing – which is becoming increasingly commonplace in AAA titles – has often been a fantasy for graphics enthusiasts, given this process calculates the colour of each pixel by using a more complicated approach to lighting that assesses how much light reflects and refract of surfaces in the environment. It’s quite expensive to compute. Hence, while ray tracing has been around for over 50 years, it has only now become feasible with top of the line GPUs to achieve that level of graphical fidelity at a consistent frame rate.

In fact, unless you have a top of the line GPU and a decent CPU and RAM to back it up, you won’t be able to experience a game at 4K resolution, which is 3840 x 2160 pixels. And let’s face it: top of the range graphics cards are in high demand and often difficult to procure, whether it’s as a result of local microchip shortages, the rise of GPUs being adopted for crypto mining, or just the fact that they’re very expensive.

As future games continue to push the graphical specs required, there’s a real risk of alienating audiences who can’t keep up with the costs involved or simply can’t get the hardware. Hence there is a need to allow for audiences to see these games at the desired output and DLSS is one approach to addressing it. With DLSS, the idea is to reduce the workload of the GPU by rendering at a lower target resolution, then with a bit of AI help, that image gets upscaled to a higher resolution when shown to the end-user. Plus it means you can ideally run the game at an even higher resolution than you would if you were rendering at the target frame rate, without the DLSS enabled.

Texture Upscaling vs DLSS

Now it’s important to recognise how and when the DLSS procedure is implemented. DLSS is what we call post-processing dynamic upscaling: meaning that it happens in real-time as the game is being rendered. While you’re playing a game that supports the feature, the image is generated by the GPU is passed through an additional layer of processing. It’s maybe rendered at a resolution such as 1080p and then upscaled via DLSS to 4K.

Previously we explored the idea of AI texture upscaling, a process of super-resolution, where an image is passed through an AI process to create a higher fidelity version of itself. It’s worth taking a minute to acknowledge these are not the same thing. Texture upscaling happens during development and as explained in that article, has been used in the likes of modding communities or more recently the remasters of the Mass Effect trilogy.

How DLSS Works

As the name implies, DLSS relies on deep convolutional neural networks trained using a form of deep learning. The idea is that the networks are trained to understand how the game should look at higher quality and then boost the output of the GPU to achieve a level of parity.

This is reliant on the customised hardware within Nvidia’s RTX hardware. Starting with the GeForce 20 series, Nvidia introduced their new Turing microarchitecture, named after the late Alan Turing: the mathematician who was massively influential in the development of computer science and artificial intelligence. The Turing architecture is notable in that it in addition to the standard processing cores Nvidia chips utilise, it also comes equipped with special AI cores known as ‘Tensor Cores’. These are inspired by the same technology that powers Google’s Tensor Processing Units, given they both do the same thing: they act as matrix-multiplication accelerators. Given so much of neural network inferencing is really just a large amount of matrix math, it gives the GPU a dedicated space within which to execute the DLSS process. Note that this doesn’t mean that the DLSS process can only run on the Tensor Cores, but the hardware is optimised to allow it to run this process for every rendered frame of the game without pushing this workload onto the main CUDA cores, which could prove too much for it to handle in realtime depending on the game.


DLSS dates back to 2016 and was built by capturing a huge amount of existing data on the games themselves. Each game that incorporates DLSS needs to have a pre-trained deep learning network built for it. This is achieved by having a training set of images for that game that are both lower resolution aliased frames, paired with those rendered at ultra-high quality with anti-aliasing enabled. But unlike the intended output of the DLSS, these high-quality images are rendered at 16K resolution: which for those of you counting is 15360 x 8640. That’s a lot of pixels… just shy of 133 million. The network – a convolutional autoencoder – is trained to attempt to upscale an aliased 1080p image to 4K and the results are then compared to the 16K images in the dataset. This is all trained on Nvidia’s DGX supercomputers with the intent of first creating a network that can upscale the low-res image to a high-quality image as best as possible. Once complete, it then trains the network again to attempt to fill in the necessary pixels to remove aliasing effects, thus removing the need for anti-aliasing to be employed on the final image. That said, anti-aliasing has proven to be detrimental to graphics being rendered by DLSS to date.


Once this is all done, in practice, the graphics card generates the low resolution render using the CUDA cores. Once done, it is then passed to the tensor cores for upscaling and this happens for every frame of the game being displayed. Now it’s the training of the DLSS to understand the images of a given game that can prove detrimental. In the original DLSS, it needed to train a unique network for each game that planned to use it. Hence while numerous games have enabled DLSS either at launch or with a patch, such as Battlefield V, Final Fantasy XV and Metro Exodus, it still takes time to customise it for each title.

Since then, DLSS has moved forward significantly, with a real effort towards addressing some of the earlier limitations of the system, resulting in DLSS 2.0 being released in the spring of 2020. One limitation of the original DLSS was blurring could appear in parts of the image. To address this, DLSS 2.0 focuses on the need to identify how the frames are being rendered and implying a sense of motion. In addition to the aliased low-resolution images, the networks are now also fed motion vectors: a process originally used for video encoding and compression. A motion vector tells us what blocks of an image shift or move between frames. And this helps the DLSS to understand the flow of time for the real-time image.

However, what is arguably the biggest change under the hood is that it no longer has to train a new neural network for each game. Previously the network config was pre-trained by Nvidia, then fed to the graphics card courtesy of a driver update, which would be loaded into the Tensor cores when needed. In DLSS 2.0, it is now one general-purpose network. Meaning that it no longer needs to train a custom network against each new release that supports the feature. Instead, it relies on one single network to do the bulk of the work. This is ideal for developers, given there is no longer a need to train a custom permutation of the neural network for each game.

One of the ways to address this was that when training the new one-size-fits-all network, it’s not trained against game-specific content anymore. The training set used is more general-purpose and built to support any image, while still including a collection of images from target games to help enrich the training set. This ultimately reduces development time for DLSS integrations given the emphasis is now to continue to refine one single trained model.

The improvements to the network size and configuration does have a knock-on effect on the performance of the hardware, given it’s not stressing the Tensor cores as drastically as before. Interestingly, while it’s not 100% clear how this has been achieved, it’s led to DLSS 2.0 being deployed on the Nvidia Shield streaming device but capped at 30 frames per second. The details aren’t clear, but it may be inspired by the DLSS 1.9 which was actually implemented in the PC release of Remedy’s 2019 shooter ‘Control’. This version used a customised version of the upgraded DLSS config but ran it on the CUDA shader cores of RTX cards. Given that the Nvidia Shield runs on the Tegra X1+ processor. This is what led to a bunch of rumour-mongering that the updated models of the Nintendo Switch would be able to have 4K upscaling, given the Switch uses a custom Tegra X1 processor.

DLSS vs Fidelity FX

In 2021 we’re now seeing more from Nvidia’s competitor AMD on their approach to this, known as FidelityFX, which was released in June of 2021 and is already showing some impressive results. There are two interesting differences between FidelityFX and DLSS. First of all, AMD has claimed it’s not powered by any form of AI. It runs on a pure-software approach that will simply use up more of the generate compute power on the GPU. And that speaks to the other key differentiator to DLSS.

The second difference – which is possibly an even bigger deal for the market as it currently stands – is that it doesn’t require custom hardware to achieve it. As mentioned, DLSS is typically running on the Tensor cores attached to the GeForce GPUs. Whereas the FidelityFX is simply designed to run on the AMD Radeon cards without dedicated processing units. This opens up the playing field for FidelityFX to be deployed on a variety of hardware. A feat that outside of the Nvidia Shield has not been embraced by their competitor. While it’s expected to run on the latest Radeon RX-6000 cards from AMD, there is a possibility it runs on their 5000 series as well.

But the big difference here could come from the console space as well as Nvidia’s own GPUs. Both the Playstation 5 and the Xbox Series S and X use a custom version of the AMD RDNA2 microarchitecture, used for the Radeon RX 6000 series. Sony and Microsoft have since confirmed that FidelixtyFX will be made available on their consoles and integration patches are now available for popular engines such as Unreal and Unity. Meanwhile, the codebase for FidelityFX is open source, meaning you can port it to other platforms, meaning you could wind up running FidelityFX upscaling on the likes of Far Cry 6 while using an Nvidia GPU.

Closing

The graphics race is a seemingly endless one, as players and developers strive for increasing fidelity in everything from race car simulators to first-person shooters and more. As the market matures to embrace a less hardware-heavy approach, AI solutions such as DLSS have the capacity to impact players experience for the better. Only time will tell what approach becomes the future of graphics in game development.