Real Time Rendering

Hello! This is a write up of a talk I gave at EGX this year, I thought it may be useful to others, so here’s my slides and transcript. Enjoy!

Introduction

Since joining the Unreal Engine team, I’ve learnt a lot about the technical reasons, limitations and processes behind working with game art, or real time rendering, which is why I am here talking to you today.

It was only a couple of years ago that I was in uni and I remember being taught the correct way of doing something, but never really understanding why I had to do it that way. Hopefully by the end of this talk, you’ll have a bit more of an understanding of what’s happening under the hood of a game engine and can use this to inform how  you create your art.

This isn’t something we can fob off to the programmers with a ‘well it’s your job to make it run smooth, I just make it pretty’. Game development is about team work and if you want to improve as an artist, take a little bit of responsibility in your art creation and make an effort to recognise how your decisions can affect the game.

If you can understand and communicate with other departments, when something goes wrong you are in a better position to be able to identify where and how to fix it.

rtr4

So can I get a quick show of hands – who here’s an artist? A programmer? A designer? Cool!

So I only have 30 minutes to get through everything so this is really going to be a simplified overview of the rendering process with some takeaways on how you can optimise your art content; enough to get your feet wet and give you an understanding of terms, but by no means a full in-depth review – I’d need over three hours to do that.

Also, most of the talk is based in and around Unreal, the majority of the concepts we’ll talk through can be applied to other game engines so no matter what you are working with, this is all still relevant. So with that all said let’s jump in!

Physically Based Rendering

slide5

Most real-time engines are centred around Physically Based Rendering, or PBR. PBR works by approximating what light actually does, rather than what we think it should do.

This means the end result tends to be more accurate, natural looking and improves the art pipeline by giving more predictable results.

slide6

Just because PBR is more accurate, doesn’t mean that it only works for photo-realistic styles, both Pixar and Disney have used PBR in their films and you can see examples of the variety of art styles accomplishable in PBR in all of these games.

But how does PBR actually affect us, as artists? PBR focuses on four main areas

slide7

Base Colour (which is your diffuse or albedo texture map). This shouldn’t contain any lighting information unless there is a specific place you always want a shadow or highlight.

rtr8_2

Roughness, which controls how reflective or matte an object appears.

rtr9_1

Metallic (how metal-like an object appears).

slide9

And if you are wondering what the difference between metallic and roughness looks like, here’s a wee breakdown between the two. On the top we have non-metal, along the bottom metal and the roughness increases the further to the right you go.

rtr10_1.gif

And finally we have Specular – which in most cases you shouldn’t be altering and if you are to make something look matte shame on you, you should be using the roughness control instead.

slide10

The only times you should change specular is to control the index of refraction for very specific cases like ice, water and skin.

That covers the main points of PBR, but there is so much more to real-time rendering!

Real Time Rendering Pipeline

slide11

There’s no perfect way to achieve RTR. You’re aiming to hit all three corners of the performance/quality/features triangle, but at best you’ll hit two out of three. At worst, and I’m sure we’ve all played games like it, you may hit none of them.

Because there’s no one solution that fits all, it’s all about balancing. Every game or project you work on will have different requirements – whether that’s different art styles, release platforms or team make-up, it’s all going to affect how you will create your art.

But if you understand how all the pieces fit together, you can make smart decisions on a per project basis to find a solution that works for you. And the best place to start, is by understanding the process of how graphics are rendered to a screen.

slide12

And we’ll start with the GPU. At it’s simplest, to render out a mesh, the GPU loads in a mesh using its vertices, the vertices are grouped together as triangles, those triangles are converted into pixels, then each pixel is given a colour and voila that’s the final image.

And that’s rendering in it’s simplest form. But looking over this, let’s go a little more in-depth.

slide13

First starting at the Input Assembly, where the GPU loads the mesh in from memory and connects the vertices together to form triangles (which is why you model should always be in quads or tris and really avoid polys with more than four sides like n-gons).

Sequence 01.gif

Next it takes those vertices and runs any applicable vertex shader on a single vertex at a time. A vertex shader mainly controls the transform of the vertex, where it is in space.

Vertex shaders are often used for ambient animation, in this case to make billowing smoke but are also commonly used to blow wind through trees, or make waves in water as they are super optimized for doing large scale animations. But it’s worth pointing out, vertex shaders are purely a visual effect, they don’t actually modify the position, scale or rotation of a model. This means things like physics or collisions are not taken into account, which can cause clipping issues if the movement takes the appearance of the mesh outside of its collision box.

slide19

Back to the GPU pipeline, we’ve loaded in the model, we’ve applied our vertex transformation and next up is rasterization. Once the GPU knows where each triangle will be on the screen it is rasterized – where the vector triangle is converted into pixels. At this point any vertex values (like vertex colours you’ve painted on, or normals) are applied across the pixels representing the triangle.

slide20

Next up the GPU runs an early depth test to see if there are any prominent large objects that are blocking out areas of the screen.

By doing an early depth test, the render can throw away some of the models that are blocked from view, so it doesn’t have to render every single one to then throw some away as it turns out something else is in front. This is a process called culling which we’ll go into more depth later on.

slide21

We then run each rasterized pixel through a pixel shader which gives the pixel its colour – a combination of its material, textures, lights and more. This is often the heaviest part of the process as a standard HD screen has over 2 million pixels, and each of those pixels is being shaded at least once!

slide22

There’s another depth test pass (also called sorting) where the GPU sorts out the order of depth so that objects are correctly layered on top of each other.

You may have heard people complain of ‘sorting issues’ this is where objects are rendered in the wrong order – either in front of or behind where they are meant to be. It can often happen with transparent objects and may require you to go in and manually set the sorting order of an object so the renderer knows where it should be.

slide23

And finally the pixel is written to the render target and displayed on screen.

Draw Calls

Now the GPU doesn’t render on its own – it has the CPU telling it what to render and how, through the process of a draw call. A draw call is ‘A group of polygons sharing the same properties’ or in Unreal terms – ‘a group of polygons sharing the same material’.

So say you have a tree that has a material for the bark, one for the leaves and one for the berries/pinecones/acorns. Three draw calls, doesn’t seem a lot?

slide26.jpg

Except you are now on a hill overlooking a forest and there’s several hundred trees in view. That’s now in to the thousands of drawcalls.

But why is this a bad thing? Draw calls are slow due to the tiny pauses the GPU makes between each draw call as it waits for the next piece of information to be handed to it from the CPU.

rtr27 2_1

Think of the difference between handing over a 1 GB file

rtr27

vs 1mil 1KB files – it’s much quicker to do one big pass that many little ones.

slide27

Draw calls will often have a more substantial impact on performance than polycount. I know it is really easy to get hung up on polycount but don’t.

It is always best practice to make assets as low poly as you can without losing the visual fidelity you are after. Every saving you make means the game can be pushed that little bit further somewhere else, but if it need those extra 100 polys to look smooth and rounded, go for it.

slide28

This is a wireframe from our Elemental Demo – and there’s around 3 million tris. At the worst part, in some of the open landscape shots we were hitting 60million tris and it still ran on console…

slide29
Empires Apart by DESTINYbit

In some cases it’s not worth reducing polycounts any further, especially if you are already doing a super low poly art style – as there’s a baseline cost to all models because of draw calls. So you can hit a point where reducing your poly count makes no more difference.

And in most cases, if you have a scene with a high number of polys, but a low number of draw calls it will likely run far smoother than vice versa.

To put that into context: the current standard for draw calls on console & PC 2000-3000 is reasonable, 5000 is high and 10000 is probably a problem. On mobile you are a looking at only a few hundred and for VR less than 1000.

There are a few different techniques you can look into to reducing your draw calls. A common solution is to use a texture atlas – where you combine all of the different texture maps used on a mesh into one single big texture. There can sometimes be issues with texture bleeding at low resolutions, but this is relatively minor compared to the performance gains that can be made.

slide31

Another method to look at is using Layered Materials, where you effectively create a single Material with a series of sub-Materials in-side which can be used to handle complex blending between unique surface types. Although this reduces draw calls, it makes the calculations in the pixel shader more complex and so can end up being more expensive.

In best practice, if you can apply separate Materials on a geometry level do so (say you have a car and need rubber for the tyres, glass for the windows, chrome on the bumpers etc.) and only use this technique if you need per-pixel control over how a Material is placed or blended.

slide32

And one other simple method to reduce draw calls is to combine models together that share the same materials. So instead of placing each tree by hand, you group several trees together and then place them in groups. Instead of each tree being a draw call, especially if they share the same material, the whole group becomes one.

So you may think, I’ll just merge all my tiny models together to make larger ones and that’ll improve performance. Well, not necessarily – larger models are worse for workflows, memory, collision calculations, and culling.

Culling

I mentioned culling before – culling is the renderers way of reducing the number of models to render by running different types of checks to build a list of objects that are visible to the player.

Culling is the renderers way of making life a bit more easy for itself by trying to only render exactly what the player can see in that moment – not everything that exists. And there are different methods of doing this.

rtr34_1.gif

You have distance culling, which removes anything past a specified distance from the camera. Here you can see the back mountains pop in and out of view.

Normally it wouldn’t be this noticeable (well you’d hope it wouldn’t be) but I shortened all the values so they disappear quite closely otherwise you find it fades into the fog – which is a common way of hiding the sudden appearance or disappearance of a model.

rtr35
Horizon Zero Dawn by Guerrilla Games

You also have frustrum culling, which checks to see what is in front of the camera and ignores everything else.

You may have seen this .gif floating around on Twitter showing how frustrum culling works on Horizon: Zero Dawn. The cone represents the cameras field of view and you can see how different chunks are rendered depending if they are coming into view of the camera or not.

I’ve shown this .gif a couple of times to people not in games and often get the question – well if there’s something big behind you that’s not being rendered how does it cast a shadow across the player? It’s worth making the distinction that just because something isn’t rendered doesn’t mean it doesn’t exist.

The CPU knows exactly where every model is, it’s just up to the GPU to decide which ones to render to the screen and show the player.

rtr36_1.gif

And finally we have occlusion culling, which checks the visibility on all actors that are still remaining to see if they are visible to the camera or if something is blocking it from view, or occluding it, hence occlusion culling

You can see in this .gif you have the player view on the left and the wireframe of objects being rendered on the right. As the camera turns you can see the objects change, and if it’s turned off you can see how many more objects would be rendered all of the time and the difference it actually makes.

As culling is based on a per object instance rather than per poly, if you have big groups of models combined to one object they are less likely to be occluded as they take up more space and are more likely to be seen in parts. So you have to find a balance between straining the CPU with draw calls, or giving the GPU more to render.

Profiling

And what often helps you find this balance is by identifying where any hold-ups are. The GPU and the CPU work in sync, but not at the same point in time. If a game is aiming for 30fps, to give both processors the maximum time possible to work, the CPU will be one or two frames (about 33ms) ahead of the GPU – this frame delay is also known as latency, and for most games, a one or two frame latency isn’t noticeable to the player.

rtr37

However, if the CPU or GPU takes longer than targeted time, again lets say we are aiming for 30fps and so 33ms to render a frame, it causes the other one to stall and wait for it to finish – which is when we see drops in framerate. (Above image source)

If you’ve ever heard the programming team talking about the game being CPU or GPU bound, it’s where the processor is at its maximum capacity and can’t process the information quickly enough, causing the other to have to wait.

If it’s CPU bound, it’s (almost) anything to do with positions and visibility – the majority of things that cause movement – such as: Physics, AI, skeletal animations, projectiles, etc.

The GPU will most likely have trouble with drawcalls, dynamic shadows or translucency – they are typically your top 3 blockers

It’s essential to know which processor is causing the other to wait, because otherwise you could end up optimising assets which end up having no effect on the problem.

rtr38

Like the .gif above, it doesn’t matter how much you optimise part 3, if you don’t sort out part 2 you can’t improve performance.

So how do you find out where the problem is? The best way is through profiling tools. There’s some built into Unreal Engine which I recommend you go and look into including visualizers to see your lightmap density, and shader complexity – plus lots of stats to look at, and there’s loads of helpful resources available online to help you get started.

slide39 4

I’d also recommend looking into RenderDoc for visual debugging, which is free to download and use and integrates with Unreal. But if this all looks a bit scary to you, if they are around go talk to your programming team – they may have even built custom tools to help with profiling and will more than likely to be happy to sit down and show you how they work.

With all of that said, it’s now worth looking at some of the methods we can use to reduce the cost of our art!

slide40

As most of the GPUs time is spent on shaders, they can often be the cause of hold ups. Often the shader is just too complex – as you can see here – it has too long a chain of calculations to make. If you haven’t used Unreal, this is the Material editor which is all node based, and when things get complex, can often end up looking like spaghetti…

And funnily enough, the best way to optimise a shader is to reduce the number of instructions! For pixel shaders (that change the colour of the pixel) this means reducing the complexity of the material.

For vertex shaders (that change the position of vertices) this means reducing the complexity of the model that it is applied too so it has less vertices to run the calculations on.

Lodding

rtr41.gif

Other than going through by hand and removing vertices from a model, one way to reduce the polycount is through lodding. Levels of details or lods, are a way of models or a bunch of models to be simplified in given conditions – usually this means swapping out models to become lower poly the further they get in the distance.

In Unreal we have an automatic LOD generator so you don’t have to create them by hand. Instead you just have to go to the mesh inside the editor, set a few drop down menus and it will create the lods for you.

slide41

Another form of lodding, hierarchical levels of detail or HLODs, combines multiple models and materials together in the distance to further lower draw calls. This isn’t something that’s normally enabled by default, and requires a bit of set-up but it’s well worth looking into if you are working with large open scenes, especially as they don’t occlude very well.

This is the Agora map from Paragon, and in it when we turned on HLODs, we reduced it down by 1.5 million tris down to 2.5million tris and the drawcalls from 7000 to less than 6000, so it can make a big difference to your scene.

Overdraw

Going back now to other bottlenecks, draw calls can cause problems with overdraw – where a pixel has to be re-rendered several times because multiple draw calls touch it.

This causes issues for the GPU as it essentially has less time per pixel to render if it wants to maintain the same framerate, as it has more pixels to render but no extra time.

You can see the visualisation of overdraw in this scene, the purple/whiter the pixels the more times they’ve needed to be re-rendered and as you can see the volcano smoke particle is the worst culprit here. And that’s mainly because there are hundreds of translucent planes being rendered on top of each other.

slide44.jpg

In-cases where there is a transparent object like smoke or ice or glass, overdraw is unavoidable. The whole point of translucent objects is that you can see through them in some way, so the GPU has to render multiple layers of pixels.

But overdraw is one of the reasons why translucent objects are more expensive than opaque ones. Where-ever possible try to avoid having transparent Materials, especially if they are stacking on top of each other.

slide45

If you need to use a texture to cut out parts of a model that are completely opaque (like on foliage), you can use a ‘Masked’ setting instead of transparent to help minimize costs.

If you do need full transparency, you can also reduce the cost by making your transparent Materials unlit, and faking the lighting wherever you can.

slide46 2.jpg

And you can further reduce the cost, by making sure the polys you are applying the translucent materials too fit as snugly around it as possible. You can an example here by the awesome hippowombat.

slide46

As we saw in the first scenario with the volcano smoke, particles can often be the worst for overdraw. If you are using Unreal and have made a particle flipbook/sprite sheet/sub UV – there’s a handy particle cutout tool (ctrl+F in the release notes to find it) which ‘trims’ the excess space around the particle to help reduce overdraw.

Overshading

Along a similar train to overdraw, overshading is another scenario which causes pixels to be re-rendered, but this time by very tiny or thin triangles (below picture by Keith O’Conor, check out his article on GPU Performance for Game Artists for a more in-depth explanation).

slide47 2

The GPU doesn’t actually process pixels one by one, it does in a quad or 2×2 pattern so it can calculate mip maps. So if a tiny triangle only takes up one pixel, the GPU still processes all 4 and then throws three of them away.

slide47

This can’t be completely avoided as the further away from an object you get the smaller the triangles become, but using lodding and culling can help this issues – and it’s one of the reasons why models should use evenly sized quads, and you should avoid having clusters of tiny thin triangles like on the top of a cylinder.

Memory

We previously talked about how the number of instructions in a shader can slow down the GPU, well so can the number and size of the textures it contains.

slide48

As the GPU executes a pixel shader, it has to access the relevant textures from memory and if it can’t access it quickly enough and the shader has to pause while it waits for the texture to arrive, it can cause a memory bandwidth issue which amongst other thing can visual artefacts like blurriness or lag.

slide49 1

One way to avoid memory issues by textures is to use a technique called Channel Packing, where you save a greyscale texture into the four channels – RGBA – of a .targa or .png.

This is something you create inside Photoshop, either to mask off different areas of an object to apply different colours inside the game engine, or to hold maps like roughness, metallic and ambient occlusion.

slide49 2

You then take that texture into the engine and ‘unpack’ it by plugging it in as needed – in this case the Mettalic, Roughness and Ambient Occlusion channels. It’s worth noting, you can’t use this technique on normal maps as they require a different type of texture compression, so will pretty much always need to be stand-alone.

By using channel packing this means the pixel shader only has to fetch one texture from memory, instead of potentially four.

slide50

Another method to avoid memory issues, is through Mip Mapping. (Above image source). If a texture fetch is taking too long, it’s likely that a lower resolution version, or Mip Map, is still available in cache and can fill in the gap. Mip maps are the lodding of textures, where the texture density is reduced, normally, by the further away from the camera you get. They are often auto-generated on import and run by default.

You can see that without Mipmaps, on the left the image looks far noiser and grittier as every detail in the texture is rendered. On the right, where the mip maps are enabled you can see how much smoother the textures look, especially in the water and in the wood.

As mips are streamed in as needed, they can sometimes cause texture ‘popping’ when the player gets closer to an object and a higher level mip is loaded in.

rtr51_1.gif

You may have spotted this sometimes when a new level is loading in and things seem kind of blurry at first before sharpening up, or when traversing high above the terrain and it’s loading in new chunks of the map – I’ve most noticeably seen it when riding around the overworld on Tengri in Ni No Kuni – and I’m really sorry if you love this game (I do too) but once you see it you can’t unsee it!

If you watch the space just above Tengri’s left wing, you can see the line where the mip maps and lods are being updated on the mountain range.

slide52

Which brings me to my final point of today – Mips are generated by taking the current mip level and then shrinking it down to  a ¼ of the size, which is the reason why your textures always need to be to the power two either as a square or as a rectangle. (Above image source).

Outro

So we have covered an awful lot of stuff today – I know I’ve been throwing information at you for the last 30 minutes but I really wanted to give you a broad overview of as many terms as possible so if anything’s particularly relatable to you or has piqued your interest, you’ve got a base understanding and can go away and do some more research.

Hopefully you’ve found this useful!

Big shoutouts to Sjoerd De Jong for his help with prepping this presentation 🙂