Unity Version: 2021 LTS

Building a custom renderer for Unity - Draw Calls & Batching

This is probably the most important part of rendering with the new SRPs so it's important that we get some ideas straight from the off.

Draw Call

A 'draw call' is a bit of a vague term that describes the CPU telling the GPU to draw... something. It's vague on purpose, you might read around online loads of people confidently saying "the number of draw calls is how many things are being drawn on your screen" or "a draw call is when you have a mesh and a material being drawn" these things are sort of true but also too specific. The Unity documentation has a great definition that is suitably vague for my liking.

A draw call tells the graphics API what to draw and how to draw it. Each draw call contains all the information the graphics API needs to draw on the screen, such as information about textures, shaders, and buffers. Draw calls can be resource intensive, but often the preparation for a draw call is more resource intensive than the draw call itself.

So it makes sense why reducing draw calls is often common advice you'll find about optimising how long your GPU takes to render your frame. It stands to reason that if you have less draw calls you'll also have less total preperation time and you'll make performance gains.

Batching

Batching is a term used to describe optimising a collection of draw calls. This can be done by reducing the number of draw calls or reducing the render-state changes between them (which is the heavy pre-draw call work discussed earlier).

Common batching techniques that you might be familiar with in Unity are:

  • GPU Instancing: This isn't actually 'batching' but it is draw call optimisation. Only works when you're using the exact same mesh with the exact same material across the scene, very effective at moving work from CPU to GPU but worth bearing in mind it's not a silver bullet (if you're already GPU bound it's not going to help).
  • Static Batching: When you check that box in the inspector that says Static then Unity will attempt to combine meshes that don't move to reduce render-state changes per frame (doesn't reduce draw calls). This does have performance caveats.
  • Dynamic Batching: Not really worth using anymore unless you're targetting low end devices (it's actually not compatible with HDRP for this reason) as the overhead from the draw call is generally lower than the overhead from the batching. Theoretically good for very small meshes like quads for particle effects.

Listing the above is a classic interview question

BiRP vs SRP Frame

Here is a capture in the Frame Debugger of the three amigos of basic meshes being rendered in the Built-in Render Pipeline.

frame from built in render pipeline showing 3 distinct draw calls for 3 different meshes
If you've never used the Frame Debugger I highly recommend you check it out! You can step through how your scene is renderered to understand which part is responsible for what you see.

And here is the same scene but being rendered with the Universal Render Pipeline.

frame from urp showing 1 SRP batch
In SRPs you don't get to see each draw call the same way but if you want to know how many are made by each batch that information is available when you click on the batch in the right side (offscreen)

You can see that in the BiRP frame there are three distinct draw calls for each mesh, there is no batching being performed because GPU instancing doesn't work across different mesh and the meshes aren't marked as static.

In the URP frame however, you can see that there is only one call within the Opaque pass that says SRP Batch. In essence this is the magic of the SRP batcher, with no additional work (at the editor level) we now have batched draw calls!

The SRP Batcher

The Unity documentation for this is actually really good so I'll briefly explain what is happening and then you should look there for nerdy details!

The SRP batcher is able to batch draw calls together that use the same shader variants. What this means is you can have any amount of materials that all have different property values but all use the same shader and no matter what meshes you apply them to, the draw calls will be batched. The kind of gains that you can expect from this depend on how many shader variants you have in your project but generally it is very good.

The way this works is by using fixed size buffers for material property data size and storing object data buffers in their own 'Per Object Large Buffer'. By using fixed buffers it means that render-state changes are minimised between draw calls as the material is not changing and properties that belong to the mesh can be processed seperately by special dedicated fast code for object level properties like vertex data.

In implementing shaders that we want to run on our scriptable render pipeline, it's very straightforward to add support for all kinds of batching.

Any properties that your shader uses need to be wrapped in UnityPerMaterial calls.

Properties
    {
        _BaseMap("Texture", 2D) = "white" {}
        _BaseColor("Color", Color) = (0.5, 0.5, 0.5, 1.0)
        _Cutoff("Alpha Cutoff", Range(0.0, 1.0)) = 0.5
        _Metallic ("Metallic", Range(0, 1)) = 0
        _Smoothness ("Smoothness", Range(0, 1)) = 0.5
    }

Subshader {
    Pass {
        ...
        UNITY_INSTANCING_BUFFER_START(UnityPerMaterial)
            UNITY_DEFINE_INSTANCED_PROP(float4, _BaseMap_ST)
            UNITY_DEFINE_INSTANCED_PROP(float4, _BaseColor)
            UNITY_DEFINE_INSTANCED_PROP(float, _Cutoff)
            UNITY_DEFINE_INSTANCED_PROP(float, _Metallic)
            UNITY_DEFINE_INSTANCED_PROP(float, _Smoothness)
        UNITY_INSTANCING_BUFFER_END(UnityPerMaterial)
        ...

Note: If you read materials online you'll find that CBUFFER_START(UnityPerMaterial) is used. That also works but if you also want to be able to support GPU instancing in your shader you can use UNITY_INSTANCING_BUFFER_START. CBUFFER means Constant Buffer by the way!

Every shader will also need the inputs from Unity for the object data such as it's transformation matrices.

CBUFFER_START(UnityPerDraw)
    float4x4 unity_ObjectToWorld;
    float4x4 unity_WorldToObject;
    float4 unity_LODFade;
    real4 unity_WorldTransformParams;
CBUFFER_END

Now the shader (and our custom renderer) can support SRP batching and GPU instancing with.

GraphicsSettings.useScriptableRenderPipelineBatching = true; 
this.useGPUInstancing = true;

Where this is referring to the class inheriting from RenderPipeline.

How to break SRP batching

You might want to do this on purpose so that you can use GPU instancing or you might accidentally do this. An unfortunate side effect of this very good new batching is that in the past, where you would use MaterialPropertyBlock to set a property of a material like changing the base color you will break SRP batching (for that mesh). It breaks because, even though you aren't modifying the material buffer size, you are changing it which means that the data needs to be rebound to the CBUFFERs.

Here is a photo I borrowed that shows why this rebinding is an issue from the excellent offical Unity blog post by Arnaud Carré where he talks in far more detail about the SRP batcher.

the flow of how the SRP batcher works with details about material data being changed
When the data changes for the material it will need to repeat the steps in the flow for material data

How to get around this issue is to create the material before you're in playmode. If you need to do something like animate a material from one color to another you can lerp between materials using Material.Lerp and this shouldn't break batching.

Conclusion

This post was so long I felt like I had to write a conclusion! In my opinion though this is the most important part of the whole experience, learning how the batcher works is going to help me make better games and I hope after reading this it can help you too!

Related Posts/Links