Monday, 13 May 2013

2D Shadows Shader

This article is based on the following theories and samples. My implementation was merely ported across to C++/DirectX for learning purposes. All credit goes to the authors.

Recommended Reading:

Introduction


A couple of weeks ago, I published a post discussing dynamic 2D shadows, the theory regarding how one could create such an effect and provided a DirectX 9 sample for just that. This week I'll be discussing the same subject using a different method, this time based around graphics shaders.

I've already made a brief post dicussing how pixel shaders can be used to achieve some basic effects in XNA, but I'll recap the important bits for this entry. A shader is a small program that runs on the GPU (as opposed to the CPU like normal programs) and is typically used to perform shading and various other post-processing effects on a scene.

Vertex shaders are run once for each vertex in a primitive and usually deal with transforming 3D world coordinates into 2D screen coordinates, as well as processing any data needed for later stages of rendering. Pixel shaders, also known as fragment shaders, are run for each pixel in the rasterised output and deal with the actual shading of pixels as well as other post-processing effects.

The advantage of using shaders for this application is that shaders are much better at graphical operations, making them ideal for processing shadows in our scene. Additionally it frees up some CPU cycles, allowing us to use them for processing other aspects (such as game logic).

Shadow mapping

Scene from a light's view, Wikipedia

It's worth going over how shadows are typically processed in a regular 3D scene to give some insight into why certain steps are taken in this theory. The core idealogy for 3D shadow maps is to create a depth map from the light's perspective, which can then be used to determine whether pixels are in front of the nearest occluding object, or behind it.

One of the basic ways this could be done is to render the scene from the light's perspective, transforming points into the light's local space instead of the camera's. This way the Z value of each pixel can be used as a measure of depth and can be output using the colour (where pixels at one extreme are brighter than the other).

When rendering the final scene, pixels can then be transformed into the light's local space and compared against the value stored in the relevant position of the depth map. Closer pixels receive a colour value from the light and further ones simply receive an ambient value to simulate being in shadow.

Application in 2D


Unfortunately, rendering a scene in 2D from a light's perspective is not as straightforward as it is in 3D. You could probably find some way to convert X and Y coordinates into X and Z coordinates, but that's not the subject of this article. The theory outlined relies on three main steps:

First, the scene is rendered such that each pixel of an object which cast shadows (caster) is coloured based on its distance from the light source. Next, the scene is distorted such that all of the potential rays from the light source are aligned on the horizontal axis. Finally, this distorted image is compressed into a strip which stores the smallest (closest) values for each row, creating a depth map. This map can then be queried when rendering the scene to determine shading values, much like a traditional 3D shadow map.

Distance map

"Distance" map

The initial phase is relatively simple. Each object in your scene just needs to be transformed into the light's local space and then shaded based on its distance from the source. In my case I also scale the scene since my light radiuses may not always be the same size as the shadow map texture.

Here I utilise the fact that the position output from the vertex shader is in the [-1, 1] range, where (0, 0) is the centre of the render target and also where the light is. I simply pass a copy of this calculated position out as a texture coordinate (since Pixel Shader Model 2.0 doesn't support a position input semantic) and then use the length of that to determine the distance of the pixel.

Shader Code:
VS_OUTPUT DefaultVS(float4 position : POSITION, float2 tex : TEXCOORD0)
{
 VS_OUTPUT output;

 output.position = mul(position, g_worldViewProjMatrix);
 output.worldPos = output.position;
 output.tex = tex;

 return output;
}

float4 DistancePS(float2 worldPos : TEXCOORD1) : COLOR0
{
 float depth = length(worldPos);
 return float4(depth, depth, depth, 1);
}

Distorted map


Distorted map
The next phase is an intermediate step where we distort the distance map such that the rays from the light source are aligned along one axis instead of emitting radially. The vertical axes are also rotated 90 degrees and stored in a separate RGB channel such that all rays are horizontal.

The way this works is that coordinates are transformed into the [-1, 1] range (where 0, 0 is the centre of the texture) and the scene is divided into four quadrants (up, down, left and right). One coordinate is then scaled based on the absolute value of the other depending on which quadrant the point is in, causing points to tend towards the origin. These coordinates are then transformed back into the [0, 1] range and used as texture sampling coordinates for our scene.

Distortion plot in horizontal axis
This is difficult to explain in words, so I've provided a simple graph showing some example horizontal distortion coordinates. The brighter, dashed lines are the input coordinates after being transformed into the [-1, 1] range and the darker, solid lines are the output coordinates.

As we can see, when the x value is 1, the y value remains the same. However as x tends towards 0, so too does y. This means that even though we're drawing pixels along a horizontal row, we're actually sampling the texture along a diagonal emitting from the centre (the position of the light). A similar process happens in the vertical quadrants, except that the other coordinate (x) is scaled instead.


Shader Code:
float4 DistortPS(float2 texCoord : TEXCOORD0) : COLOR0
{
 // Transform coordinates into (-1, 1) domain.
 float u0 = (texCoord.x * 2) - 1;
 float v0 = (texCoord.y * 2) - 1;

 // As U approaches 0, V also tends towards 0.
 v0 *= abs(u0);

 // Convert back to (0, 1) domain.
 v0 = (v0 + 1) / 2;

 float2 coords = float2(texCoord.x, v0);

 // Store values in horizontal and vertical axes in separate channels.
 float h = tex2D(inputSampler, coords).r;
 float v = tex2D(inputSampler, coords.yx).r;

 return float4(h, v, 0, 1);
}

Depth map


Depth map, stretched for clarity
The next phase is to compress the distortion map into a 2-pixel wide strip, such that only the closest values in each quadrant remain. If you compare the stretched image to the right to the previous step, you'll see how the closest coordinates (green, red, black) are preserved in each half.

In Catalin's implementation, he goes about this by repeatedly downsampling his texture by half, taking the lowest values from pairs of pixels. One of the problems I see with this approach is that you need to have a different render target for each pass, all the way down to the final 2-pixel wide depth map. Another problem is that you end up sampling the same values multiple times. Not only do you have to sample and compare the entire texture once for the first downsample, but then you have to continually sample the remaining values until you get down to 2. This means that you almost end up sampling each pixel twice for the entire texture.

My implementation differs in that it has more passes, but each pixel is only sampled and compared once for the most part. It also only uses the input and output render targets, without any intermediate ones in between. The way it works is that we repeatedly sample a chunk of 8 pixels from the texture and output the lowest value from that batch of 8. We then then use a minimum blending operation to choose the smallest value between the destination and source pixels and write that to the render target. Finally, we offset the sampling position and repeat this for every chunk in the current section until we've sampled all the pixels and obtained the minimum value.

C++ Code:
// Convert distorted scene into a depth map.
device->SetRenderTarget(0, m_depthMapSurface);
device->SetRenderState(D3DRS_BLENDOP, D3DBLENDOP_MIN);

float largeStep = 1.0f / m_shadowResolution;
float smallStep = 1.0f / 2.0f;

effect->SetTechnique(handles[HANDLE_HREDUCTION]);
effect->SetTexture(handles[HANDLE_INPUTTEXTURE], m_tempMap);
effect->SetFloat(handles[HANDLE_HREDUCTIONSTEP], largeStep);

int chunks = (int)(m_shadowResolution / 2) / maxReductionSamples;

device->Clear(0, NULL, D3DCLEAR_TARGET | D3DCLEAR_ZBUFFER,
 D3DCOLOR_RGBA(255, 255, 255, 255), 1.0f, 0);

effect->Begin(&passes, NULL);

for (int c = 0; c < chunks; c++)
{
 // Calculate start point for this chunk set.
 float start = (largeStep / 2) + (c * (largeStep * 8));
 effect->SetFloat(handles[HANDLE_HREDUCTIONSTART], start);
 effect->CommitChanges();
   
 for (unsigned int i = 0; i < passes; i++)
 {
  effect->BeginPass(i);
  device->DrawPrimitive(D3DPT_TRIANGLESTRIP, 0, 2);
  effect->EndPass();
 }
}

effect->End();
device->SetRenderState(D3DRS_BLENDOP, D3DBLENDOP_ADD);
Shader Code:
VS_HREDUCT_OUTPUT HReductionVS(
 float4 position : POSITION, float2 tex : TEXCOORD0)
{
 VS_HREDUCT_OUTPUT output;

 output.position = mul(position, g_worldViewProjMatrix);

 for (int i = 0; i < HREDUCTION_MAX_STEPS; i++)
 {
  output.tex[i] = float2(tex.x + g_hReductionStart
   + (g_hReductionStep * i), tex.y);
 }

 return output;
}

float4 HReductionPS(VS_HREDUCT_OUTPUT input) : COLOR0
{
 float2 colour = (float2)tex2D(inputSampler, input.tex[0]);

 for (int i = 1; i < HREDUCTION_MAX_STEPS; i++)
  colour = min(colour, (float2)tex2D(inputSampler, input.tex[i]));

 return float4(colour, 0, 1);
}

Shadow map


Final shadow map for a light
The final phase is to draw our completed shadow map. This step is relatively straightforward, for each pixel we get the relevant value from our depth map and compare the distances from the centre. If the current pixel is closer then we shade it in brightness, if it's further then we shade it as black. We also simulate light attenuation by using the distance of the lit pixels to determine how bright they are.

The only not-so-straightforward bit is determining and getting the relevant depth map value. For this, we simply transform the x and y coordinates into the [-1, 1] range again and compare their absolute values. We then apply the inverse distortion operation to either x or y, depending on which was greater, and sample the relevant depth map channel at those coordinates. Another important thing to note is the inclusion of a 1-point bias to the pixel distance to prevent shadow acne.

Shader Code:
float4 ShadowMapPS(float2 texCoord : TEXCOORD0) : COLOR0
{
 float distance = length(texCoord - 0.5f) * 2;
 distance *= 512.0f;
 distance -= 1;

 // Transform coordinates into (-1, 1) domain.
 float u0 = (texCoord.x * 2) - 1;
 float v0 = (texCoord.y * 2) - 1;

 float shadowMapDistance = abs(u0) > abs(v0)
  ? GetShadowDistanceH(texCoord) : GetShadowDistanceV(texCoord);

 // Render as black or white depending on distance and depth value.
 float light = distance < shadowMapDistance * 512.0f
  ? 1 - (distance / 512.0f) : 0;

 float4 colour = light;
 colour.a = 1;

 return colour;
}

Final light map


Final scene light map
After we have each light's shadow map, we follow a similar process to the previous dynamic 2D shadows method. We draw each shadow map at its relevant position and colour on the light map texture and then we render the light map over the top of our scene using the Zero and Source Colour blending modes for the Source and Destination textures respectively. As before, this means that for each pixel in our scene, only the colours in the light map are rendered at that position.


Source code and comments


I greatly prefer this method over the previous one. Not only is the result of better quality and easier to adjust and add new features to, but the potential FPS gain is massive. The previous implemention of dynamic 2D lights had an FPS count of a little over 800 in release mode. This new approach has roughly 1400 FPS in release, almost double the previous amount. Granted, the GPU in my development computer is pretty decent, but it goes to show what sort of gains you can get by offloading some work from the CPU.

An optimisation that could be made is that currently each light owns its own shadow, distortion and depth map textures. At least for the distortion and depth maps, these textures could be shared among multiple lights with the same shadow resolution since the values are only needed during the rendering of a light's shadow map and can be discarded as soon as it is complete. Another optimisation is that static lights could skip redrawing their shadow map until they move. Only 2 of the 6 lights in the sample ever move, so it would be relatively simple to add a check to skip redrawing if the light hasn't moved since the last call.

In Catalin's sample, he also applies a distance-based gaussian blur to simulate the way that shadows tend to get sharper the closer they are to the light source (as well as smoothing out the rough edges on shadows). My implementation skips this step, as I was aiming for a result comparable to my previous sample. See his article for an implementation for this effect.

Credits:
George - Wizard Sprite
Studio Evil - Ground Texture

Source:
Sample Archive
DirectX 9 Redistributable