## HDR Display – First Steps

Recently NVIDIA send us a nice HDR TV and we got a chance to checkout this new HDR display stuff in practice. It was a rather a fast implementation, as our game is shipping in less than 2 months from now. Regardless, results are good and it was definitely worth to identify issues and make preparations for full HDR display support. In future, we will be revisiting HDR display implementation, but first we need HDR monitors to become available (current HDR TVs are simply too big for a normal work), so we can think about using increased brightness and color gamut and in our art pipeline.

# Tone Mapping

We want to output scRGB values (linear values, RGB primaries, 1.0 maps to 80 nits and ~12.5 maps to 1000 nits). Just like for the LDR display I just fitted ACES RRT+scRGB (1000 nits) to a simple analytical curve. Currently there is no HDR TV which supports more than 1000 nits, so there was no point in supporting anything else.

float3 ACESFilmRec2020( float3 x )
{
float a = 15.8f;
float b = 2.12f;
float c = 1.2f;
float d = 5.92f;
float e = 1.9f;
return ( x * ( a * x + b ) ) / ( x * ( c * x + d ) + e );
}

Just like in case of LDR curve, this curve is shifted a bit and in order to get a reference curve just multiply input x by 0.6. Curve isn’t precise at the end of the range, but it isn’t very important in practice:

# UI

First issue with UI is that 1.0 in HDR render target maps to around 80 nits, which is looks too dark compared to the image on a LDR display. Solution was very simple – just multiply UI output by a magic constant🙂. Second issue with UI is that alpha blending with very bright pixels causes artifacts. In order to fix that we needed to draw UI to a separate render target and do a custom blend it with the rest of the scene in a separate pass.

# Color Grading

Color grading was the only rendering pass, which used scene colors after tone mapping. Obviously, having two different curves (one for LDR display and one for the HDR display) breaks consistency of this pass. I looked through our color grading settings and managed to simplify it to a simple analytic system – shadow / highlight tint with some extra settings. Redoing color grading at this stage of the project was out of the question, so all old color grading settings were automatically fitted using least-squares. For the next project we plan grading in some different space with more bits and log like curve (ACEScc or Sony S-Log).

# Content

Some things in our game look awesome in HDR display, but some don’t look so good. Most issues are caused by “artistic” lighting setups, which were carefully tuned for the LDR tone mapping curve. E.g. in some places sunlight is nicely “burned” in when viewed on LDR display, but on the HDR display looks washed out, as lighting isn’t bright enough. Unfortunately, this is something that can’t be fixed last minute and something to think about when we will be creating content for the next game.

# Summary

Current HDR displays don’t have amazing brightness. 1000 nits (current HDR displays) vs 300 nits (current LDR displays) isn’t that big difference, as perceived brightness is square of luminance. On the other hand HDR displays add a lot of additional details – pixels which were grey, because of the tone mapping curve, now get a lot of color. Anyway, we are moving forwards here and there is no excuse not to support HDR displays.

# Digging Deeper

Posted in Graphics | 2 Comments

## GDC 2016 Presentations

(last update: May 20, 2016)

This year’s GDC was awesome. Some amazing presentations and again I could chat with super-smart and inspiring people. Be sure to check out “Advanced Techniques and Optimization of HDR Color Pipelines”, “Optimizing the Graphics Pipeline with Compute” and “Photogrammetry and Star Wars Battlefront” . Growing list of presentations:

# GDC Vault

Posted in Conference | 1 Comment

## Automatic Exposure

In games automatic exposure or eye adaptation is an algorithm for simulating eye reaction to temporal changes in lighting conditions and for selecting optimal exposure for a given scene and lighting conditions. The main challenge here is that optimal settings are hard to define. Should we expose for sunlight, shadows or something in between? Should the image be normally exposed, underexposed or overexposed? This is main reason why some people dislike automatic exposure and prefer to set exposure manually.

In photography exposure is something that’s carefully selected by a photographer during the shot or afterwards during photo processing and for many linear games with static lighting exposing manually is indeed a good solution. Even for some games with changing lighting conditions this can be done manually by placing virtual luminance meters and selecting one using manually placed triggers or exposure volumes (post process volumes).

In some cases manual exposure won’t be enough; dynamic levels, big open worlds, a lot of lighting variation or simply when we can’t afford to spend time manually tweaking exposure volumes.

# Standard approach

Automatic exposure in games is a pretty old concept. When HDR was introduced it was a must have feature for a HDR lighting pipeline. Standard approach to automatic exposure is to compute scene’s geometric mean of luminance (log2 average) and map it to some “key value”:

$\text{Exposure}=\frac{\text{KeyValue}}{\text{Clamp}\left(L,L_{\min },L_{\max }\right)}$

Then we multiply all pixels by exposure, add tone mapping, color grading and gamma.

This standard approach is still used in many games – even high profile titles like The Order 1886, but it also has many downsides and requires a lot of manual tweaking [NP15]. Lighting artists need to manually place multiple exposure volumes, which define optimal key value and min/max luminance values per region. Let’s see how we can improve over the standard approach.

# EV as luminance units

Photographers usually work with EVs ($L=0.125\frac{ \text{cd} }{m^2}*2^{\text{EV}_{100}}$) for metering scene luminance. EVs provide approximately perceptually uniform log2 scale (one EV step doubles luminance) and are more intuitive to work with than raw luminance values [Ree14]. Additionally, instead of tweaking key value they tweak exposure compensation (EC), which is again represented in EV units. In order to be more artist friendly we should let them tweak all automatic exposure parameters in EV units instead of raw luminances, replace key value with constant 18% middle gray and add EC for manual exposure biasing:

$\text{Exposure}=\frac{0.18}{\text{Clamp}\left(L,L_{\min },L_{\max }\right)-L_{\text{EC}}}$

We could also take one step further and implement a physically based camera by parameterizing exposure equation using camera aperture, shutter time and ISO. It’s not important for this article and you can find all the required details in excellent course notes by Sébastien Lagarde and Charles de Rousiers [LR14].

# Center weighted metering

Simple average metering is rarely used in real cameras. Usually the center of the screen is most important for the viewer and should be well exposed. We can take it into the account by metering in the small circle in the center of the image or by giving more influence to luminance values located in the center of the screen [Hen14]. Additionally we could also use a compute shader for computing averages [Pet11]. This is usually simpler and more efficient than repeated texture downsampling.

# Histogram

Unfortunately, using either of the averages as described above has its issues. Small dark or bright spots (e.g. very bright specular reflections) can strongly influence the average. For example, if a player hides behind a dark tree, metering will result in very low average scene luminance and as a result will overexpose the entire image. Furthermore usually we don’t want to expose for some kind of average lighting condition. Instead we want to expose for the dominant one.

A nice solution here is to use histogram, so we can adapt to some kind of median luminance instead of average. Valve used that approach for HL2: Episode One. They calculated one histogram bin per frame using occlusion queries and built the full histogram on the CPU side [Vla08]. Nowadays we can easily and efficiently build a histogram using a compute shader. More importantly with a compute shader a single texel can influence two nearby bins by some fractional amount. This allows us to cover the entire EV step with just a few log2 space bins. Using just 64 bins we can cover a large range of 16 steps and 128 bins are enough to cover entire range of real world exposures. We could also do a “sliding” histogram, just like we pre-expose the image (multiply shader outputs by adaptation from a previous frame, so we can store HDR data in R11G11B10Float buffers without any precision issues). By the way scene pre-exposure was also introduced by Valve in HL2: Episode One. This way they were able to have a full HDR pipeline using just LDR render targets.

Finally after computing the histogram we skip a large percentage (50%-80%) of the darkest pixels, a smaller percentage (2%-20%) of the brightest pixels and calculate the average from the remaining ones. Metering this way stabilizes automatic exposure and helps to focus exposure on something important.

# Exposure compensation curve

Exposure compensation (or key value) determines whether the exposed image will be relatively dark or bright. Imagine a dark room with closed shutters. After opening shutters, sunlight enters the room and lighting becomes at least a few EV steps brighter. We would expect the final image to also become brighter, but automatic exposure tries to maintain constant final image’s brightness and the image will look almost the same as before opening the shutters. As a rule, we want to have a darker image in low light conditions and a brighter image in high light conditions. This way viewer has a clue as to how bright the lighting is in the current scene. To account for this Krawczyk et al. [KMS05] empirically specified key values for several luminance conditions and fitted a simple curve:

$\text{KeyValue}=1.03\, -\frac{2}{\log _{10}(L+1)+2}$

We can translate that to EV units and plot:

This curve may be a bit too extreme for games, as high key results in a really bright image, but we can just roll our own function or even better – allow artists to tweak the exposure compensation curve directly and store it in a small lookup 1D texture.

# FX and translucents

It’s impossible to balance the luminance of FX (particles, beams, trails…) for various times of a day while using real world enormous ranges of luminance values for lighting. FX artists want their effects to be well visible and have some glow in direct sunlight (~100000 LUX) and at the same time they shouldn’t be overblown in full moon lighting (~0.25 LUX).

Trying to balance single FX brightness for different lighting conditions. Images from [Vai14]

Similarly unlit debug meshes like transparent lines, planes and other editor meshes, should maintain constant brightness on screen despite varying exposure and scene lighting. Some debug meshes could be rendered after the HDR pipeline (after tone mapping), but most have to go through the HDR pipeline in order to get proper translucent sorting. This is important not only for translucent debug meshes, but also for all antialiased opaque debug meshes.

This problem was solved in the game Infamous Second Son by applying manual exposure offset per time of a day [Vai14], but the developers weren’t happy with this solution, as it requires a lot of manual tweaking. A simpler and more robust solution is to negate exposure by dividing color by an estimated exposure. This estimated exposure can be our exposure from a previous frame, can be calculated from a virtual light meter placed at camera position, can be estimated from lightmap values at the center of the screen or can be estimated from light probes at camera position. In any case we usually don’t want to totally negate exposure, but we want to give the artists a slider which blends in log2 space between those two values. This way FX will be darker in low lighting and brighter in high lighting, while still being easily controlled by the artists.

It makes no sense to adapt to a hacked luminance (pixels with a constant brightness on screen) and in some cases it can even introduce a feedback loop. Additionally, some FX like weapon muzzle flashes or explosions shouldn’t influence automatic exposure. As a rule, we want to adapt to something that’s constant on the screen like fire particles, but don’t want to adapt to temporary FX like muzzle flashes or explosions. Debug meshes also shouldn’t influence automatic exposure, so it’s possible to set up automatic exposure in editor or draw debug meshes without changing the final image’s brightness. We could try selectively picking what influences automatic exposure and what not, but it requires storing extra data per pass.

A simple solution to both issues is to compute automatic exposure basing on scene luminance just after the main opaque pass. This way we can skip all translucents, emissive and debug meshes and they won’t influence automatic exposure. Additionally, this fixes feedback loop issues (at least if you don’t want to hack lights). The downside is that it won’t adapt to things like big emissive panels, but we can easily fix it by marking such surfaces in G-buffer or stencil buffer and adding emissive to automatic exposure input only for the marked surfaces.

# Adapting to illuminance

Automatic exposure works with the final pixel luminance values and ignores the material reflectance (information about how dark materials are). For example, after automatic exposure a wall painted in white will just look just like a wall painted in black. We had this kind of issue in dark corridors, where part of them were covered in snow – either walls were too bright or snow was too dark.

Naty Hoffman in his talk proposed to adapt to illuminance instead of final pixel luminance [Hof13]. This way material reflectance won’t influence automatic exposure – dark corridors will remain dark and snow will be pure white as expected. Additionally it will remove specular from automatic exposure input and further stabilize the automatic exposure.

Most deferred engines have either a separate illuminance (diffuse lighting) buffer or some form of lighting buffer with additional information in alpha channel, which allows to approximately reconstruct plain illuminance. Usually this is motivated by very popular SSSSS algorithm (screen space subsurface scattering) which requires a separate illuminance buffer [JZJ*15].

Having the illuminance, we just need to add skybox and (lit) fog to create the final buffer for automatic exposure calculation. It’s not obvious how to treat skybox. One constant color per skybox? Convert it to illuminance? Just sample skybox texture (luminance)? We settled on luminance, as we want exposure to change depending on camera direction (bright white clouds near sun should have different exposure than darker parts of the skybox at the opposite side). In any case we additionally need a manual exposure compensation for the skybox, so lighting artists are able to manually set the optimal skybox brightness on screen, as the nighttime skybox should be much darker on screen than a daytime one.

# Temporal adaptation

Eye reaction to temporal changes in lighting conditions is usually simulated by blending exposures from many frames using the exponential decay function:

$L_{\text{temporal}}=L_{\text{temporal}}+\left(L-L_{\text{temporal}}\right)\left(1-e^{-\Delta \text{time}* \tau }\right)$

In reality, the time of adaptation differs depending on whether we adapt to light or to darkness and on lighting conditions as cones and rods have different adaptation speeds.

Furthermore rod and cones have different characterics and different light sensitivities. For example, when adapting to dark, colored surfaces appear colorless after the rod-cone break. Full light adaptation takes around 5 minutes and full dark adaptation is around 20-30 minutes [Wika]. This lengthy times are the reason why pilots and (possibly) pirates used eye patches [Wikb]. This way they were able to remove the eye patch and instantly see clearly in the dark without having to wait 20 minutes.

For games these are unreasonably long time frames and exact temporal adaptation details aren’t important. Maybe in titles like Metal Gear Solid it would be interesting to use an eye patch for instant dark adaptation or speedup dark adaptation by eating rich in vitamin A foods (e.g. carrot or fish). For most games interesting takeaway here is to differentiate speeds of light adaptation and dark adaptation.

# What’s next?

What are other possible ideas we could try? One very appealing idea is to adapt to a single dominant lighting condition. Possible implementation would be to bucket lights by tags, pick the most popular bin and use it for automatic exposure.

We could also build a RGB histogram and use it for automatic chrominance adaptation which allows the human visual system to adapt to lighting of a particular color (in photography it’s called automatic white balance [Wro15]).

Many cameras feature a multi-zone metering modes for tracking moving subjects – usually for sports photography. The idea here is to track a moving subject and try to expose based on its luminance. In games we have more information and we could extend this approach. We could expose based on important objects in a scene like enemies, player in third person view or predefined objects.

Using automatic exposure more complex operations are possible than with manual exposure. For example, we could expose different parts of the image differently based on Ansel Adam’s Zone System [YS12]. This approach could simulate real movie lighting pipeline or advanced photo processing, where often HDR lighting is compressed using various tricks like placing additional lights in the dark interiors or placing neutral density gels on the windows. This would fix gameplay issues caused by too wide range of luminance values. For example, when a player being indoors can’t see enemies outside in sunlight, because that part of image is overexposed.

Finally I’d like to thank Bartłomiej Wroński for an interesting discussion about automatic exposure.

# References

[NP15] David Neubelt, Matt Pettineo – “Advanced Lighting R&D at Ready At Dawn Studios”, SIGGRAPH 2015
[Ree14] Nathan Reed – “Artist-Friendly HDR With Exposure Values”, 2014
[LR14] Sébastien Lagarde, Charles de Rousiers – “Moving Frostbite to Physically Based Rendering 2.0”, SIGGRAPH 2014
[Hen14] Padraic Hennessy – “Implementing a Physically Based Camera: Automatic Exposure”, 2014
[Pet11] Matt Pettineo – “Average Luminance Calculation Using A Compute Shader”, 2011
[Vla08] Alex Vlachos – “Post Processing in The Orange Box”, GDC2008
[KMS05] Grzegorz Krawczyk, Karol Myszkowski, Hans-Peter Seidel – “Perceptual Effects in Real-time Tone Mapping”, SCCG 2005
[Vai14] Matt Vainio, “The Visual Effects of Infamous: Second Son”, GDC 2014
[Hof13] Naty Hoffman – “Outside the Echo Chamber: Learning from Other Disciplines”, i3D 2013
[JZJ*15] Jorge Jimenez, Károly Zsolnai, Adrian Jarabo, Christian Freude, Thomas Auzinger, Xian-Chun Wu, Javier von der Pahlen, Michael Wimmer and Diego Gutierrez – “Separable Subsurface Scattering”, CGF 2015
[Wika] Wikipedia – “Adaptation (eye)”
[Wikb] Wikipedia – “Eyepatch – Use for adaptation to dark”
[Wro15] Bartłomiej Wroński – “White balance and physically based rendering pipelines”, 2015
[YS12] Lu Yuan, Jian Sun – “Automatic Exposure Correction of Consumer Photographs“, ECCV 2012

Posted in Graphics, Lighting | 5 Comments

## ACES Filmic Tone Mapping Curve

Careful mapping of HDR values to LDR is an important part of a modern game rendering pipeline. One of the goals of our new renderer was to replace Reinhard‘s tone mapping curve with some kind of a filmic tone mapping curve. We tried one from Ucharted 2 and tried rolling our own, but weren’t happy with either of this solutions. Finally, we settled on the one from ACES, which is currently a default tone mapping curve in Unreal Engine 4.

ACES color encoding system was designed for seamless working with color images regardless of input or output color space. It also features a carefully crafted filmic curve for displaying HDR images on LDR output devices. Full ACES integration is a bit of overkill for games, but we can just sample ODT( RRT( x ) ) transform and fit a simple curve to this data. We don’t even need to run any ACES code at all, as ACES provides reference images for all transforms. Although there is no linear RGB D65 ODT transform, but we can just use REC709 D65 and remove 2.4 gamma from it.

Curve was manually fitted (max fit error: 0.0138) to be more precise in the blacks – after all we will be applying some kind gamma afterwards. Additionally, data was pre-exposed, so 1 on input maps to ~0.8 on output and resulting image’s brightness is more consistent with the one without any tone mapping curve at all. For the original ACES curve just multiply input (x) by 0.6.

Fitted curve’s HLSL source code:

float3 ACESFilm( float3 x )
{
float a = 2.51f;
float b = 0.03f;
float c = 2.43f;
float d = 0.59f;
float e = 0.14f;
return saturate((x*(a*x+b))/(x*(c*x+d)+e));
}

Fitted curve plotted against source data’s sample points:

Posted in Graphics, Lighting | 9 Comments

## Analytical DFG Term for IBL

Image-based lighting is an important part of a physically based rendering. Unfortunately straightforward IBL implementation for more complicated lighting models than Phong requires a huge lookup table and isn’t practical for real time. Current state of the art approach is split sum approximation [Kar13], which decomposes IBL integral into two terms: LD and DFG. LD is stored in standard cube map and DFG is stored in one global 2D LUT texture. This texture is usually 128×128 R16G16F, contains scale and bias for specular color and is indexed by roughness/gloss and ndotv. DFG LUT is quite regular and looks like it could be efficiently approximated by some kind of low order polynomial.

My main motivation was to create a custom 3ds Max shader, so artists could see how their work will look in our engine. Of course 3ds Max supports custom textures, but it’s not very user friendly and error prone when artists need to assign some strange LUT texture. It’s better to hide such internal details. Furthermore it can be beneficial for performance, as you can replace memory lookup with ALU. Especially on bandwidth constrained platforms like mobile devices.

# Surface fitting

There are many surface fitting tools, which given some data points and equation, automatically find best coefficients. It’s also possible to transform curve fitting problem into a nonlinear optimization problem and use tool designed for solving them. I prefer to work with Matlab, so of course I used Matlab’s cftool. It’s a separate application with GUI. You just enter an equation and it automatically fits functions, plots surface against data points and computes error metrics like SSE or RMSE. Furthermore you can compare side by side with previous approximations. Popular Mathematica can also easily fit surfaces (FindFit), but it requires more work, as you need to write some code for plotting and calculating error metrics.

Usually curve fitting is used for smoothing data, so most literature and tools focus on linear functions like polynomial and Gaussian curves. For real-time rendering polynomial curves are most cost efficient on modern scalar architectures like GCN. Polynomial curves avoid costly transcendentals (exp2, log2 etc.), which are quarter rate on GCN. For extra quality add freebies like saturate or abs to constrain function output. In some specific cases it’s worth to add other full rate instructions like min, max or cndmask.

Most fitting is done with non linear functions, where fitting tools often are stuck in a local solution. In order to find a global one you can either write a script which fits for different starting points and compares results or just try a few points by hand until plotted function will look good. For more complicated cases there are smarter tools for finding global minimum like Matlab’s MultiStart or GlobalSearch.

Last thing is not only to try polynomial of some order, but also play with all it’s variables. Usually I first search for order of polynomial which properly approximates given data and then try to remove higher order variables and compare results. This step could be automatized to check all variable combinations. I never did it, as higher order are impractical for real time rendering, so there aren’t too many combinations.

# DFG LUT

First I tried to generate LUT inside Matlab, but it was too slow compute, so I switched to C++ and loaded that LUT as CSV. Full C++ source for LUT generation is on Github. It uses popular GGX distribution, Smith geometry term and Schlick’s Fresnel approximation. Additionally I use roughness remap $gloss=(1-roughness)^{4}$ which results results in similar distribution to Blinn-Phong $2^{gloss*16}$. This remap is also similar to $gloss=(1-roughness*0.7)^{6}$, which was used by Crytek in Ryse [Sch14].

for ( unsigned y = 0; y < LUT_HEIGHT; ++y )
{
float const ndotv = ( y + 0.5f ) / LUT_WIDTH;

for ( unsigned x = 0; x < LUT_WIDTH; ++x )
{
float const gloss = ( x + 0.5f ) / LUT_HEIGHT;
float const roughness = powf( 1.0f - gloss, 4.0f );

float const vx = sqrtf( 1.0f - ndotv * ndotv );
float const vy = 0.0f;
float const vz = ndotv;

float scale = 0.0f;
float bias = 0.0f;

for ( unsigned i = 0; i < sampleNum; ++i )
{
float const e1 = (float) i / sampleNum;
float const e2 = (float) ( (double) ReverseBits( i ) / (double) 0x100000000LL );

float const phi = 2.0f * MATH_PI * e1;
float const cosPhi = cosf( phi );
float const sinPhi = sinf( phi );
float const cosTheta = sqrtf( ( 1.0f - e2 ) / ( 1.0f + ( roughness * roughness - 1.0f ) * e2 ) );
float const sinTheta = sqrtf( 1.0f - cosTheta * cosTheta );

float const hx = sinTheta * cosf( phi );
float const hy = sinTheta * sinf( phi );
float const hz = cosTheta;

float const vdh = vx * hx + vy * hy + vz * hz;
float const lx = 2.0f * vdh * hx - vx;
float const ly = 2.0f * vdh * hy - vy;
float const lz = 2.0f * vdh * hz - vz;

float const ndotl = std::max( lz, 0.0f );
float const ndoth = std::max( hz, 0.0f );
float const vdoth = std::max( vdh, 0.0f );

if ( ndotl > 0.0f )
{
float const gsmith = GSmith( roughness, ndotv, ndotl );
float const ndotlVisPDF = ndotl * gsmith * ( 4.0f * vdoth / ndoth );
float const fc = powf( 1.0f - vdoth, 5.0f );

scale += ndotlVisPDF * ( 1.0f - fc );
bias += ndotlVisPDF * fc;
}

scale /= sampleNum;
bias /= sampleNum;
}
}
}

Code above outputs texture like this:

# Approximation

[Laz13] presented an analytical solution to DFG term. He used Blinn-Phong distribution, so first I fitted his approximation for GGX and my roughness remap. Instead of storing scale directly, delta is used (scale = delta – bias). It simplifies fitting as delta is a simpler surface than scale. Additionally to get a tighter fit I added saturate for bias and delta values.

float3 EnvDFGLazarov( float3 specularColor, float gloss, float ndotv )
{
float4 p0 = float4( 0.5745, 1.548, -0.02397, 1.301 );
float4 p1 = float4( 0.5753, -0.2511, -0.02066, 0.4755 );

float4 t = gloss * p0 + p1;

float bias = saturate( t.x * min( t.y, exp2( -7.672 * ndotv ) ) + t.z );
float delta = saturate( t.w );
float scale = delta - bias;

bias *= saturate( 50.0 * specularColor.y );
return specularColor * scale + bias;
}

Then I tried to find a better approximation. I focused on simple instructions in order to avoid transcendentals like exp, which are quarter rate on GCN. I tried many ideas for bias fitting – from simple polynomials to expensive Gaussians. Finally settled on two polynomials oriented to axes and combined with min. One depends only on x and second only on y. Fitting delta was easy – 2nd order polynomial with some additional term did the job.

float3 EnvDFGPolynomial( float3 specularColor, float gloss, float ndotv )
{
float x = gloss;
float y = ndotv;

float b1 = -0.1688;
float b2 = 1.895;
float b3 = 0.9903;
float b4 = -4.853;
float b5 = 8.404;
float b6 = -5.069;
float bias = saturate( min( b1 * x + b2 * x * x, b3 + b4 * y + b5 * y * y + b6 * y * y * y ) );

float d0 = 0.6045;
float d1 = 1.699;
float d2 = -0.5228;
float d3 = -3.603;
float d4 = 1.404;
float d5 = 0.1939;
float d6 = 2.661;
float delta = saturate( d0 + d1 * x + d2 * y + d3 * x * x + d4 * x * y + d5 * y * y + d6 * x * x * x );
float scale = delta - bias;

bias *= saturate( 50.0 * specularColor.y );
return specularColor * scale + bias;
}

Some screenshots comparing reference and two approximations:

Instruction histograms on GCN architecture:

Lazarov Polynomial fit
v_exp_f32 1
v_mac_f32 3 3
v_min_f32 1 1
v_mov_b32 4 2
v_mul_f32 1 5
v_add_f32 2
v_madmk_f32 4
v_mad_f32 2 1
v_subrev_f32 1 1
total cycles: 16 19

# Conclusion

To sum up I presented here a simple analytical function for DFG approximation. In practice it’s hard to distinguish this approximation from reference and it uses a moderate amount of ALU.

# References

[Kar13] B. Karis – “Real Shading in Unreal Engine 4”, Siggraph 2013
[Laz13] D. Lazarov – “Getting More Physical in Call of Duty: Black Ops II”, Siggraph 2013
[Sch14] N. Schulz – “The Rendering Technology of Ryse”, GDC 2014

Posted in Graphics, Lighting | 7 Comments

## Lightmapping in Anomaly 2 mobile

In 2013 Anomaly 2 mobile version (iOS/Android/Blackberry) by a small indie studio 11 Bit Studios was released. It was an interesting project as we needed to run heavy content from PC version on much weaker mobile platforms. I’d like to write about rendering technology behind it and share my experiences about implementing a lightmap baker. Especially because there isn’t too much information about lightmapping and rendering on mobile devices.

Anomaly 2 mobile goal was to try to reach PC version quality and reuse as many assets as possible. PC version had tons of dynamic lights, dynamic shadows, SSAO and similar effects. There was no way to run it on mobile directly. Programming graphics for mobile feels like going 5-10 years back in time. Comparing to PC or consoles mobile GPUs are slow, games are rendered in high resolutions and there is a solid amount of memory available. A perfect fit for some kind of precomputed lighting.

# Lightmapper overview

Anomaly 1 mobile version had simple lightmaps, which I hacked in a few hours. Baking was done in two steps. First step rendered PC real-time directional light with shadows. Second step added ambient occlusion by rendering manually placed darkening quads. Lightmap was applied only to the terrain, which consisted of a single textured 2D plane.

For Anomaly 2 mobile we decided to write a proper baked lighting solution. Unfortunately it had to be based on DX9. At that time our game editor was based on DX9 as we had no time for implementing DX11 support. DX11 enables new possibilities for optimizations. Namely compute shaders and smaller draw call overhead. Both are very important, as GPU lightmapper is often bottlenecked by CPU and some operations aren’t a good fit for pixel shaders.

There are many lightmap format flavors – plain accumulated diffuse lighting, directional normal maps (radiosity normal maps), spherical harmonics or dominant light direction (ambient and directional light per texel). For a good overview check out presentation by Illuminate Labs (Beast creators) [Lar10].

Graphics artists wanted high lightmap density for sharp shadows and detailed lighting. Additionally changing difficulty levels changed objects placement, so some parts of lightmap needed to be stored multiple times per difficulty level. There was no way we could use any format other than accumulated diffuse because of memory requirements. Storing accumulated diffuse is also the fastest method and every millisecond counts on mobile. Unfortunately selected format doesn’t support normal mapping, so I had to resort to a hack to add normal maps.

For baking I went with a classical approach. Render a hemicube from a POV of every lightmap texel to gather irradiance at that point and later integrate it to compute radiance [Eli00]. Similar to solution used in The Witness, which is described by Ignacio Castaño in a series of great posts on their blog [Cas10] [Cas10].

# Object chart UV

First step is to generate some unique UV for every static object in the game which has lightmaps. It’s important to place seams manually in places where lighting is discontinuous. Those seems will prevent lighting from leaking across hard edges. Theoretically hard edges information should be read from mesh source file. For our case it was easier. XSI by default creates hard edges for angles greater than 60 degrees, so it was enough to place a seam in places where angle between normals was higher than mentioned value.

For chart UV generation D3DX UVAtlas was used. It’s based on research made 10 years ago (“Iso-Charts” [ZSGS04] and “Signal Parametrization” [SGSH02]), so actually you can find better algorithms nowadays. Especially interesting are surface quadrangulation algorithms like [BZK09] or [ZHLB10], which are compatible with “invisible seam” UV algorithm [RNLL10]. Pretty intense stuff, but fixes seam issues forever.

Chart UV are generated so it has appropriate density (texel per square meter). I used two metrics – average density around target and min density no less than 90% of target.

Level creation pipeline was heavily based on prefabs (object instances). There was no point in using methods like signal based parametrization [SGSH02]. Theoretically It could be used in order to place more detail on roofs of buildings and less on the sides. In practice there was no way to find how that building was used. Every instance could have custom rotation and non-uniform scale.

# UV chart packing per object

Most charts have rectangular shapes, so first all charts are rotated to their best fit rectangle. Then charts are sorted by area and max side. Finally charts are packed using a brute force chart packer. It introduces one chart at time, testing all possible locations and all combinations of multiple of 90 angle rotations and mirror transforms. Best location is chosen using extent metrics. There was one additional constraint in order to minimize texture compression artifacts – max 3 charts per 4×4 texel block.

Most papers use tetris like packing schemes. Nowadays there is plenty power for a bruteforce solution, which achieves much better pack ratios. For faster packing even GPU could be used:

1. Rasterize all possible chart combinations or blit prerasterized sprites using additive blending.
2. Test for chart overlap and other constraint violations using pixel or compute shader.

For us an optimized CPU packer was fast enough, as we were packing charts only once per prefab (object template) during mesh import. When packing charts for an entire level into one atlas a GPU approach can help a lot. In that case there is one additional trick to achieve better pack ratios. That is to leverage GPU wrap texture address mode and allow charts to wrap around the borders of the atlas [NS11].

Most our levels were city landscapes with a lot of hard edges. This means a lot of small UV charts which require a lot of padding. I have used some tricks to reduce those borders:

1. 1×1 texel charts were collapsed and snapped to texel center.
2. 1xN / Nx1 texel charts were similarly collapsed to a line and snapped to texel centers.
3. NxN charts were resized and aligned to texel centers.

# Atlas packing

After placing objects on the level their lightmaps were packed into multiple 2kx2k atlases. Classic binpacking using a binary tree was used [Sco03]. First all objects(rectangles) were sorted by their area and max side. Then objects were inserted into binary tree one at a time. If couldn’t insert into first 2kx2k atlas, then tried to insert into second etc. Every object was surrounded by a proper border (1 texel width). Additionally every object was resized and aligned to full 4×4 texel block in order to minimize texture compression artifacts.

After packing unique scale and bias lightmap UV parameters were assigned to every instance. At runtime scale and bias were applied to lightmap UV in vertex shader. It allowed to reuse lightmap UV per object, so all mesh instances could share the same vertex buffer. Additionally every mesh has unique UV mapping, which can be used for other purposes.

# Direct lighting baking

In order to bake lightmap we first need to calculate appropriate world position and normal per lightmap texel. Most straightforward way is to rasterize the geometry in the lightmap UV space, writing position and normal. It’s best to use conservative rasterization with analytical antialiasing. Conservative rasterization ensures that every texel covered by geometry will be included. Not only those where texel center is covered by geometry. Analytical antialiasing ensures that the best centroid will be taken.

Actual rasterization was done using half-space rasterization with a bit tweaked fill rules and line equations in order to make rasterization conservative. For each rasterized texel, source triangle was clipped to that texel’s bounding box and its area and centroid was calculated. In a case of multiple overlapping triangles per texel the one with highest texel area coverage was picked. Finally centroid was used to calculate world position and normal. Very similar rasterization implementation can be found in NVIDIA mesh processing tool sources.

Direct lighting baking was done by looping through lights and drawing quads which covered affected objects in lightmap. Almost the same pixel shader was used as for real time lights used in PC version, just with disabled specular etc. In order to sample shadows, objects were batched according to location, shadow map was focused on that batch and entire batch was rendered. This step outputs a set of FP16 lightmaps with accumulated direct diffuse lighting.

# Indirect lighting baking

To bake indirect (bounced) lighting and ambient occlusion hemicubes were used. Apart from already calculated world position and normal, hemicube rendering requires some arbitrary “up direction”. It can be either constant direction resulting in banding or a random one – resulting in noise. For us random “up direction” worked best. Just needed to make sure that random seed is dependent on world position.Without it some lightmaps which were generated per difficulty level wouldn’t fit with the rest of lightmaps, which were generated once.

Now we have, per lightmap texel, a hemicube position (world position), hemicube direction (world normal) and hemicube up direction. Straightforward hemicube approach requires rendering scene to 5 different viewports per every lightmap texel. For better performance I decided to use a single plane with a ~126.87 degrees FOV (double side length of image plane compared to hemicube rending). In our case it looked almost as good, but turned out to be a few times faster. There is also a question how to treat missing samples. According to [Gau08] the best approach is to replicate edge texels – increase weights on edges. In our case simple weight renormalization looked best.

When baking sometimes camera’s near plane intersects with nearby surfaces. A simple solution would be to move near plane closer to the eye until that overlap disappears. Unfortunately moving that plane too much introduces depth buffer artifacts. To fix it pixel shader tests for back faces using VFACE (SV_IsFrontFace). When back faces are encountered ambient occlusion and bounced light is set to zero. Theoretically this is not always correct, but in practice looked good and fixed all visual issues.

Finally, gathered radiance needs to be converted to irradiance. It requires weighting samples by cosinus factor and solid angle. Texels located at cube map corners have smaller solid angle and should influence the result less. Solid angle can be calculated by projecting texel onto unit sphere and calculating its area on the sphere [Dri12].

I’ve precomputed weights and stored them in a lookup texture, so they could be directly used by pixel shader:

float ElementArea( float x, float y )
{
return atan2f( x * y, sqrtf( x * x + y * y + 1.0f ) );
}

// calculate weights
float weightSum = 0.0f;
for ( unsigned y = 0; y < IL_PROBE_SIZE; ++y )
{
for ( unsigned x = 0; x < IL_PROBE_SIZE; ++x )
{
float const tu = ( 4.0f * ( x + 0.5f ) / IL_PROBE_SIZE ) - 2.0f;
float const tv = ( 4.0f * ( y + 0.5f ) / IL_PROBE_SIZE ) - 2.0f;

// cosFactor = |v1( tu, tv, 1 )| DOT v2( 0, 0, 1 ) = |v1|.z
float const cosFactor = 1.0f / sqrtf( tu * tu + tv * tv + 1.0f );

// solid angle projection
float const texelStep = 2.0f / IL_PROBE_SIZE;
float const x0 = tu - texelStep;
float const y0 = tv - texelStep;
float const x1 = tu + texelStep;
float const y1 = tv + texelStep;
float const solidAngle = ElementArea( x0, y0 ) - ElementArea( x0, y1 ) - ElementArea( x1, y0 ) + ElementArea( x1, y1 );

float const weight = cosFactor * solidAngle;
weightSum += weight;

ILProbeWeights[ x + y * IL_PROBE_SIZE ] = weight;
}
}

// normalize weights
for ( unsigned y = 0; y < IL_PROBE_SIZE; ++y )
{
for ( unsigned x = 0; x < IL_PROBE_SIZE; ++x )
{
ILProbeWeights[ x + y * IL_PROBE_SIZE ] /= weightSum;
}
}

For achieving best performance all mentioned steps were done exclusively on the GPU. Only final lightmaps were copied to main memory in order to store them on disk.

Steps:
1. Render multiple IL probes
2. Integrate using precomputed cubemap and downscale to 1×1 texel (watch out for F16 precision issues)
3. copy texel to final position in lightmap
Last steps were batched in order to optimally use GPU.

For fast preview mode and generally for faster baking irradiance caching [Cas11] [Cas14] [Dri09] could be used. It works by smartly picking sample positions and filling missing places using interpolation. Sample placement is estimated by analyzing calculated radiance in existing sample locations. It leads to workloads, which are much harder to batch and are less GPU friendly than bruteforce approach. It’s especially bad, when you can’t use compute shaders. Due to this and due to time constraints I didn’t implement it, but results from other people look very promising. It’s something I’d like to implement in future if I ever write another baker.

# Terrain lightmap

Terrain consisted mostly of multiple flat tiles and decals placed on top of them. Tile were small and wasted a lot of lightmap space on padding. Decals didn’t reuse lightmap and lighting values were needlessly duplicated. Borders between those tiles were also problematic because of seam artifacts created by UV discontinuity and aggressive PVR compression. In order to solve those issues a special terrain lightmap was introduced. Basically it was a big 2D plane, placed on a specific height, for which lighting was baked. Artist could mark which objects and decals should use this lightmap instead of having it’s own. Artist also used this lightmap for small 3d objects on terrain (eg. small debris). This solved terrain seam issues and resulted in more efficient lightmap texture usage.

# Lightmap real-time composition

Baking results were stored as two FP16 textures with linear values. First texture contained direct lighting, second – bounced lighting and ambient occlusion. All inputs could be mixed in real-time in editor, just like layers in Photoshop. Everything was controlled by curves. Artists could tweak ambient occlusion strength, colorize lighting, increase bounced lighting strength etc. Everything was handled by a single pixel shader and was really fast. Not a physically correct approach, but it enabled fast iterations for final light tweaks, without requiring lengthy lighting rebake.

From a technical side this step merged high precision lighting components and outputted a single lightmap in RGBX8_SRGB format. Lighting values were rescaled to [0;2] range for some extra lighting range. It’s possible to dynamically select lighting range per object for better encoding. However in our case lightmap textures were low precision and were heavily compressed (PVR 2bpp), so it would result in visible discontinuities between two objects with different lighting scale.

Apart from composition this step also fills unused lightmap texels and tries to weld lightmap UV discontinuities. Parts of mesh that are connected in 3D space can be disjoint in lightmap UV space. Interpolation along this disjoint edge causes visible seams. Invisible seam algorithm [RNLL10] could be used to fix it. Unfortunately invisible seam algorithm requires complicated quadrangulation algorithms like [BZK09] or [ZHLB10]. It also imposes additional restrictions on UV charts, so UV space usage is less efficient. Compression is an another source of seams along those discontinuities. Compression seam removal requires introducing additional constraints and reduces UV space usage efficiency even more.

This was an overkill for us, as there are approximate methods, which work with any UV parametrization. Those methods search for a “best” border texel value. Either by evaluating a few points and using least squares [Iwa13] or trying to solve analytically bilinear filtering equation [Yan06].

I went with a simple and rough solution – average values across seams during lightmap composition pass. Additionally in order to reduce texture compression artifacts final composition pass was to flood fill lightmap values, so unused texels would have similar values as neighbors.

# Improving usability

Iteration times are very important for graphics artists. That’s why I added selective baking and minimal rebake. Selective baking allows to bake only selected objects. Minimal rebake allows to bake only modified objects and their appropriate surroundings. Over night all levels were automatically rebaked and changes pushed to SVN. So usually only a small part of scene needed to be rebaked during normal workflow and iteration times were manageable.

# Static object lighting

Static object lighting was based on baked diffuse in lightmaps. For better quality and lighting resolution normal maps were used. Due to memory constraints I couldn’t do proper normal mapping with lightmaps and had resort to a hack. Normals were combined with dominant light’s direction (usually sun) and were used for perturbing lightmap values:

float diffuseMult = saturate( dot( normalTS, lightDirTS ) ) * FLDScale + FLDBias;
float3 diffuse = diffuse * diffuseMult;

This hack worked out quite nice in practice adding extra detail and helping to reduce compression artifacts. This was very important as we were using hardcore PVR 2bpp compression. Normal maps were also used for real time specular (calculated for a single light) and for envmaps.

# Dynamic object lighting

Dynamic objects were primary lit using diffuse stored in light probes (captured irradiance for all possible orientations at a single point of space). Specular and normal maps were added just like in case of static object’s.

There are many ways of implementing light probes. Again because of performance I had to choose the simplest method – “Valve ambient cube” [MMG06]. Which is almost like a 1×1 texel face cubemap, which stores lighting values per face direction. For general usage it has a lot issues – for example it’s not rotationally invariant, so lighting error depends on light’s angle. For our case it was a good fit. Game had almost always top down view and those basis allowed to harness that fact. The top cube face was the most important one.

A single ambient cube was stored as a single chrominance value with 6 intensity values per each face. This allowed to drive down light probe memory usage and reduce computations a bit. In our case the top face was the most important, so it had highest weight and bottom face had the lowest weight when computing a single chrominance value for captured lighting.

Light probes were stored as a few layers of dense 2D grids. There are much better schemes like tetrahedral tessellation [Cup12]. In our case 4 regular 2D grids (4 height layers) were enough, so memory wasn’t here an issue. Besides regular grid speeds up light probe lookup and simplifies debugging. With grid I could just save results as a 2D texture for simple visualization.

At runtime a single light probe per object was calculated. It was done by selecting one cell in grid (cube consisting of 8 probes in the corners) and using trilinear interpolation to compute light probe value at object’s center.

When light probes are arranged in a regular grid, some of them are placed inside of geometry. The result is that dynamic objects are too dark near obstacles. To solve this issue, probes for which most of gathered environment consists of back faces were marked as invalid. At runtime those invalid probes were discarded and were excluded from interpolation by setting their weights to zero and renormalizing other weights.
Single light probe per entire object lighting is only correct for the center of that object, so no shadow transitions are possible on its surface. In order to fix it irradiance gradients [Tat05] were used. Again used the top down view to reduce computations, so irradiance gradient was computed and applied only for top face. For fast evaluation a simple linear gradient was used. Gradient computation was quite straightforward. Per axis two additional light probes were calculated (located at center -/+ 70% of bbox half extent). Then gradient value which minimizes error was taken. At runtime, per vertex, a position offset was calculated and multiplied by that value:

float3 probeL;
probeL.x = LightProbeLuminance[ ( nrmWS.x >= 0.0 ? 1 : 0 ) + 0 ];
probeL.y = LightProbeLuminance[ ( nrmWS.y >= 0.0 ? 1 : 0 ) + 2 ];
probeL.z = LightProbeLuminance[ ( nrmWS.z >= 0.0 ? 1 : 0 ) + 4 ];

// apply irradiance gradient to top face
float3 offsetWS = posWS - LightProbeCenterWS;
probeL.y += offsetWS.xxx * LightProbeGradientX;
probeL.y += offsetWS.yyy * LightProbeGradientY;
probeL.y += offsetWS.zzz * LightProbeGradientZ;

float3 sqNormalWS = normalWS * normalWS;
float3 diffuse = dot( sqNormalWS, probeL ) * LightProbeRGB;

Of course it doesn’t look no way as good as real shadows. On the other hand it enabled some shadow transitions at a cost of a few vertex shader instructions.

I tried to render more than 6 directions per ambient cube and then compute final probe values using least squares, but it didn’t noticeably improve quality.

When baking light probes dynamic objects can’t be included, so they can’t occlude sun or other light coming from above. Resulting lighting at the bottom of light probe is too strong and objects placed near terrain tend to have unnatural bright lighting from below. To fix it ambient cube’s bottom face was darkened a bit.

There were many redundant light probes, so for storage a simple dictionary coder was used. Per grid layer a dictionary of light probes was maintained. Light probes were stored in grid as indices to specific entry in the dictionary. For extra compression a kd-tree could be used for storing those indices [Ani09], but in our case dictionary coder was enough to reach storage requirements.

# Dynamic object shadows

Dynamic shadows are always a big challenge when using baked lighting. Moreover in our case rendering budget was very tight. In multiplayer matches players could place really a lot of dynamic objects, so there was no way we could use shadow maps.

For Anomaly 1 mobile Bartosz Brzostek build an interesting system of dynamic shadows for units. Every dynamic object had prebaked projective shadows. In other words shadow sprites were projected on the terrain. Those sprites were attached to selected attachment points (bones). I extended this system with new lightmap support and reused it for Anomaly 2 mobile. For example a tank has one sprite for chassis and one sprite for turret.

Shadows were gathered in screen space using a small offscreen (x16 times downscaled) buffer. First this buffer was cleared to white. Later, shadow sprites were projected on a 2D ground plane and rendered using min blend mode to prevent double shadowing artifacts. Finally this offscreen target was combined with lightmap during normal rendering pass. This constrained shadow receivers only to flat terrain. Additionally combining lightmap, which accumulates lighting from multiple shadowing light sources, with dynamic shadows is wrong on many levels. On the other hand it allowed cheap multiple dynamic shadows.

It enabled one additional cool trick – artist could prebake shadow penumbra. Parts near the ground had dark and sharp shadows and parts placed high were brighter and more blurred.

# Conclusion

Lightmapper allowed graphics artists to move heavy content from the PC versions. It also turned out quite fast. 20-30 min for full quality level rebake on standard PC. In order to reach this level of performance I’ve written a special lightweight render path just for baking. Still baking was mostly bottlenecked by draw calls (driver time).

Finally I’d like to thank the entire team at 11 Bit Studios for making this game, Wojciech Sterna for proofreading and Michał Iwanicki for an interesting discussion about lightmaps.

# References

[Lar10] David Larsson – “The Devil is in the Details: Nuances of Light Mapping”, Gamefest 2010
[ZSGS04] – K. Zhou, J. Snyder, B. Guo, H-Y Shum – “Iso-charts: Stretch-driven Mesh Parameterization using Spectral Analysis”, Eurographics 2004
[SGSH02] – P.V. Sander, S.J. Gortler J. Snyder, H. Hoppe – “Signal-Specialized Parametrization”, Eurographics 2002
[RNLL10] N. Ray, V. Nivoliers, S. Lefebvre, B. Lévy – “Invisible Seams”, Eurographics 2010
[BZK09] D.Bommes H.Zimmer L.Kobbelt – “Mixed-Integer Quadrangulation”, Siggraph 2009
[ZHLB10] M. Zhang, J. Huang, X. Liu, H. Bao – “A Wave-based Anisotropic Quadrangulation Method”, Siggraph 2010
[Sco03] Jim Scott – “Packing Lightmaps”, 2003
[Dri12] Rory Driscoll – “Cubemap Texel Solid Angle”, 2012
[Yan06] Yann L –“Radiosity on curved surfaces?”, GameDev.net forum post 2006
[Iwa13] Michał Iwanicki – “Lighting Technology Of “The Last Of Us”, Siggraph 2013
[NS11] T.Nöll, D.Stricker – “Efficient Packing of Arbitrary Shaped Charts for Automatic Texture Atlas Generation“, Eurographics 2011
[Eli00] Hugo Elias – “Radiosity”, 2000
[Cas10] Ignacio Castaño – “Hemicube Rendering and Integration”, 2010
[Cas10] Ignacio Castaño – “Lightmap Parameterization”, 2010
[Cas11] Ignacio Castaño – “Irradiance Caching – Part 1”, 2011
[Cas14] Ignacio Castaño – “Irradiance Caching – Continued”, 2014
[Dri09] Rory Driscoll – Irradiance Caching: Part 1, 2009
[Gau08] Pascal Gautron – “Practical Global Illumination With Irradiance Caching”, Siggraph 2008 class notes
[Cup12] Robert Cupisz – “Light probe interpolation using tetrahedral tessellations”, GDC 2012
[MMG06] J. Mitchell, G. McTaggart, C. Green – “Shading in Valve’s Source Engine”, Siggraph 2006
[Tat05] – Natalya Tatarchuk – “Irradiance Volumes for Games”, GDC 2005
[Ani09] S. Anichini – “A Production Irradiance Volume Implementation Described”, 2009

Posted in Graphics, Lighting | 3 Comments

## Digital Dragons 2014 Programming Track

Digital Dragons is a game industry conference held in Kraków (Poland). This year’s programming track was outstanding and mainly focused on graphics. Allpresentations are really worth reading.

Posted in Conference | 4 Comments