Over the holiday break I had some time to play with interesting ideas presented during the last SIGGRAPH. One thing which caught my attention was new analytical cloth BRDF from Sony Pictures Imageworks [EK17], which they use in movie production.

# AshikhminD

Current state of the art of cloth shading in games still seems to be Ashikhmin velvet BRDF [AS07], which was popularized in games by Ready at Dawn [NP13]. It basically boils down to skipping geometry term, replacing traditional microfacet BRDF denominator by a smoother version and using an inverted Gaussian for the distribution term:

$D={\frac{1}{\pi(1+4 \alpha ^2)}}(1+\frac{4 \exp(-\frac{\cot^2\theta}{\alpha ^2})}{\sin ^4\theta})$

Full shader code (microfacet BRDF denominator and geometry term is included in V term):

float AshikhminD(float roughness, float ndoth)
{
float m2    = roughness * roughness;
float cos2h = ndoth * ndoth;
float sin2h = 1. - cos2h;
float sin4h = sin2h * sin2h;
return (sin4h + 4. * exp(-cos2h / (sin2h * m2))) / (PI * (1. + 4. * m2) * sin4h);
}

float AshikhminV(float ndotv, float ndotl)
{
return 1. / (4. * (ndotl + ndotv - ndotl * ndotv));
}

vec3 specular = lightColor * f * d * v * PI * ndotl;

# CharlieD

Imageworks’ presentation proposes a new cloth distribution term, which they call “Charlie” sheen:

$D=\frac{\left(2+\frac{1}{\alpha }\right)\sin^{\alpha}\theta}{2\pi}$

This term has more intuitive behavior with changing roughness and solves the issue of harsh transitions (near ndotl = 1) of Ashikhnim velvet BRDF:

Left: Ashikhmin Right: Charlie

Although Charlie distribution term is simpler than Ashikhmin’s, Imageworks’ approximation for the physically based height correlated Smith geometry term is quite heavy for real-time rendering. Nevertheless, we can just use CharlieD and follow the same process as in [AS07] for the geometry term and BRDF denominator:

float CharlieD(float roughness, float ndoth)
{
float invR = 1. / roughness;
float cos2h = ndoth * ndoth;
float sin2h = 1. - cos2h;
return (2. + invR) * pow(sin2h, invR * .5) / (2. * PI);
}

float AshikhminV(float ndotv, float ndotl)
{
return 1. / (4. * (ndotl + ndotv - ndotl * ndotv));
}

vec3 specular = lightColor * f * d * v * PI * ndotl;

This results in a bit better looking, more intuitive to tweak and faster replacement of standard Ashikhmin velvet BRDF. See this Shadertoy for an interactive sample with full source code.

# References

[NP13] David Neubelt, Matt Pettineo – “Crafting a Next-Gen Material Pipeline for The Order: 1886”, SIGGRAPH 2013
[AS07] Michael Ashikhmin, Simon Premoze – “Distribution-based BRDFs”, 2007
[EK17] Alejandro Conty Estevez, Christopher Kulla – “Production Friendly Microfacet Sheen BRDF”, SIGGRAPH 2017

## Digital Dragons 2017

A few days ago I had a chance to attend and speak at Digital Dragons 2017 about rendering in Shadow Warrior 2. It was a total blast – very pro organized, had a honor to meet some incredible people and listen to some very inspiring talks. Anyway, if you are interested in the presentation (with notes), you can download it here – “Rendering of Shadow Warrior 2”.

Posted in Conference, Graphics, Lighting, Post Processing | 7 Comments

## Job System and ParallelFor

Some time ago, while profiling our game, I noticed that we have a lot of thread locking and contention resulting from a single mutexed MPMC job queue processing a large amount of tiny jobs. It wasn’t possible to merge work into larger jobs, as it would result in bad scheduling. Obviously, the more fine grained work items there are, the better they schedule.

There are two standard solutions: either make the global MPMC queue lock-free or use job stealing.

Global lock-free MPMC queue is quite complex to implement and still has a lot of contention when processing a large amount of small jobs. Maciej Siniło has a great post about a lock-free MPMC implementations if you are looking for one.

Job stealing replaces a single global MPMC queue with multiple lock-free local MPMC queues (one per a job thread). Jobs are pushed to multiple queues (static scheduling). Every job thread processes its own local queue and if there are no jobs left then it tries to steal a job from the end of a random queue (check out this post for an in-depth description). Job stealing has its own issues – it messes up the order of job processing or in other words it trades lower latency for a higher throughput. Moreover, if static scheduling fails (e.g. jobs have widely different lengths), then job stealing can degrade to a global MPMC queue with a lot of contention.

Before going nuclear with a lock-free MPMC queue or before implementing job stealing it may be interesting to consider some alternatives. I learned to avoid complex generic solutions and instead to favor specialized, but simpler ones. Maybe the specialized solutions won’t be better in the end, but at least it will be easier for the future code maintainer to make some changes or rewrites.

Going back to my profiling investigation, the interesting part was that almost all of those jobs were effectively doing a simple parallel for – spawning a lot of jobs of the same type in order to process the entire array of work items. For example: test visibility of 50k bounding boxes, simulate 100 particle emitters etc. This gave me the idea to abstract job system specifically for this case – a single function, array of elements to process in parallel and shared job configuration (dependencies, priorities, affinities etc.).

The implementation is simple. First we need a ParallelForJob structure (just remember to add some padding to this structure in order to avoid false sharing).

struct ParallelForJob
{
uint pushNum;
uint popNum;
uint completedNum;

uint elemBatchSize;
uint nextArrayElem;
uint arraySize;

func* function;
};

In order to add a new work item, we just push a single job to the global, protected by a mutex, MPMC queue. Contention isn’t an issue here, because the number of jobs going through this global queue is low.

uint reqBatchNum = ( arraySize + elemBatchSize - 1 ) / elemBatchSize;
uint reqPushNum = ( reqBatchNum + JOB_THREAD_NUM - 1 ) / JOB_THREAD_NUM;
uint pushNum = Min( reqPushNum, JOB_THREAD_NUM );

ParallelForJob job;
job.pushNum = pushNum;
job.popNum = 0;
job.completedNum = 0;
job.elemBatchSize = elemBatchSize;
job.nextArrayElem = 0;
job.arraySize = arraySize;

jobQueueMutex.lock();
jobQueue.push( job );
jobQueueMutex.unlock();

After releasing the job thread semaphore, waked job threads pick the next ParallelForJob from the global queue.

jobQueueMutex.lock();
jobQueue.peek( job );
if ( job.popNum.AtomicAdd( 1 ) + 1 == job.pushNum )
{
jobQueue.pop();
}
jobQueueMutex.unlock();

Next, job thread starts to process array elements of the picked job. Array elements make a fixed size queue without any producers, so a simple atomic increment is enough to safely pick the next batch of array elements from the multiple job threads in parallel.

while ( true )
{
uint fromElem = job.nextArrayElem.AtomicAdd( job.elemBatchSize );
uint toElem = Min( fromElem + job.elemBatchSize, job.arraySize );
for ( uint i = fromElem; i < toElem; ++i )
{
job.function( i );
}

if ( toElem >= job.arraySize )
{
break;
}
}

Finally, the last job thread runs an optional cleanup or dependency code.

if ( job.completedNum.AtomicAdd( 1 ) + 1 == job.pushNum )
{
OnJobFinished( job );
}

Recently, I found out that Arseny Kapoulkine implemented something similar, but with an extra thread wait for the other threads to finish at the end of ParallelForJob processing loop. Still IMO it’s not a widely know approach and it’s worth sharing.

The interesting part about ParallelForJob is that it allows to pause and resume a job without using fibers (just store current array index) and allows to easily cancel a job in flight (just override current array index). Furthermore, this abstraction can be also applied to the jobs themselves. Just replace an array of elements with an array of jobs (instead of an array elements you commit and process an array of jobs).

Posted in C++, Multithreading | 1 Comment

## GDC 2017 Presentations

(last update: April 5, 2017)

# Mobile

Posted in Conference | 6 Comments

## HDR Display – First Steps

Recently NVIDIA send us a nice HDR TV and we got a chance to checkout this new HDR display stuff in practice. It was a rather a fast implementation, as our game is shipping in less than 2 months from now. Regardless, results are good and it was definitely worth to identify issues and make preparations for full HDR display support. In future, we will be revisiting HDR display implementation, but first we need HDR monitors to become available (current HDR TVs are simply too big for a normal work), so we can think about using increased brightness and color gamut and in our art pipeline.

# Tone Mapping

We want to output scRGB values (linear values, RGB primaries, 1.0 maps to 80 nits and ~12.5 maps to 1000 nits). Just like for the LDR display I just fitted ACES RRT+scRGB (1000 nits) to a simple analytical curve. Currently there is no HDR TV which supports more than 1000 nits, so there was no point in supporting anything else.

float3 ACESFilmRec2020( float3 x )
{
float a = 15.8f;
float b = 2.12f;
float c = 1.2f;
float d = 5.92f;
float e = 1.9f;
return ( x * ( a * x + b ) ) / ( x * ( c * x + d ) + e );
}

Just like in case of LDR curve, this curve is shifted a bit and in order to get a reference curve just multiply input x by 0.6. Curve isn’t precise at the end of the range, but it isn’t very important in practice:

# UI

First issue with UI is that 1.0 in HDR render target maps to around 80 nits, which is looks too dark compared to the image on a LDR display. Solution was very simple – just multiply UI output by a magic constant :). Second issue with UI is that alpha blending with very bright pixels causes artifacts. In order to fix that we needed to draw UI to a separate render target and do a custom blend it with the rest of the scene in a separate pass.

Color grading was the only rendering pass, which used scene colors after tone mapping. Obviously, having two different curves (one for LDR display and one for the HDR display) breaks consistency of this pass. I looked through our color grading settings and managed to simplify it to a simple analytic system – shadow / highlight tint with some extra settings. Redoing color grading at this stage of the project was out of the question, so all old color grading settings were automatically fitted using least-squares. For the next project we plan grading in some different space with more bits and log like curve (ACEScc or Sony S-Log).

# Content

Some things in our game look awesome in HDR display, but some don’t look so good. Most issues are caused by “artistic” lighting setups, which were carefully tuned for the LDR tone mapping curve. E.g. in some places sunlight is nicely “burned” in when viewed on LDR display, but on the HDR display looks washed out, as lighting isn’t bright enough. Unfortunately, this is something that can’t be fixed last minute and something to think about when we will be creating content for the next game.

# Summary

Current HDR displays don’t have amazing brightness. 1000 nits (current HDR displays) vs 300 nits (current LDR displays) isn’t that big difference, as perceived brightness is square of luminance. On the other hand HDR displays add a lot of additional details – pixels which were grey, because of the tone mapping curve, now get a lot of color. Anyway, we are moving forwards here and there is no excuse not to support HDR displays.

# Digging Deeper

Posted in Graphics | 2 Comments

## GDC 2016 Presentations

(last update: May 20, 2016)

This year’s GDC was awesome. Some amazing presentations and again I could chat with super-smart and inspiring people. Be sure to check out “Advanced Techniques and Optimization of HDR Color Pipelines”, “Optimizing the Graphics Pipeline with Compute” and “Photogrammetry and Star Wars Battlefront” . Growing list of presentations:

# GDC Vault

Posted in Conference | 2 Comments

## Automatic Exposure

In games automatic exposure or eye adaptation is an algorithm for simulating eye reaction to temporal changes in lighting conditions and for selecting optimal exposure for a given scene and lighting conditions. The main challenge here is that optimal settings are hard to define. Should we expose for sunlight, shadows or something in between? Should the image be normally exposed, underexposed or overexposed? This is main reason why some people dislike automatic exposure and prefer to set exposure manually.

In photography exposure is something that’s carefully selected by a photographer during the shot or afterwards during photo processing and for many linear games with static lighting exposing manually is indeed a good solution. Even for some games with changing lighting conditions this can be done manually by placing virtual luminance meters and selecting one using manually placed triggers or exposure volumes (post process volumes).

In some cases manual exposure won’t be enough; dynamic levels, big open worlds, a lot of lighting variation or simply when we can’t afford to spend time manually tweaking exposure volumes.

# Standard approach

Automatic exposure in games is a pretty old concept. When HDR was introduced it was a must have feature for a HDR lighting pipeline. Standard approach to automatic exposure is to compute scene’s geometric mean of luminance (log2 average) and map it to some “key value”:

$\text{Exposure}=\frac{\text{KeyValue}}{\text{Clamp}\left(L,L_{\min },L_{\max }\right)}$

Then we multiply all pixels by exposure, add tone mapping, color grading and gamma.

This standard approach is still used in many games – even high profile titles like The Order 1886, but it also has many downsides and requires a lot of manual tweaking [NP15]. Lighting artists need to manually place multiple exposure volumes, which define optimal key value and min/max luminance values per region. Let’s see how we can improve over the standard approach.

# EV as luminance units

Photographers usually work with EVs ($L=0.125\frac{ \text{cd} }{m^2}*2^{\text{EV}_{100}}$) for metering scene luminance. EVs provide approximately perceptually uniform log2 scale (one EV step doubles luminance) and are more intuitive to work with than raw luminance values [Ree14]. Additionally, instead of tweaking key value they tweak exposure compensation (EC), which is again represented in EV units. In order to be more artist friendly we should let them tweak all automatic exposure parameters in EV units instead of raw luminances, replace key value with constant 18% middle gray and add EC for manual exposure biasing:

$\text{Exposure}=\frac{0.18}{\text{Clamp}\left(L,L_{\min },L_{\max }\right)-L_{\text{EC}}}$

We could also take one step further and implement a physically based camera by parameterizing exposure equation using camera aperture, shutter time and ISO. It’s not important for this article and you can find all the required details in excellent course notes by Sébastien Lagarde and Charles de Rousiers [LR14].

# Center weighted metering

Simple average metering is rarely used in real cameras. Usually the center of the screen is most important for the viewer and should be well exposed. We can take it into the account by metering in the small circle in the center of the image or by giving more influence to luminance values located in the center of the screen [Hen14]. Additionally we could also use a compute shader for computing averages [Pet11]. This is usually simpler and more efficient than repeated texture downsampling.

# Histogram

Unfortunately, using either of the averages as described above has its issues. Small dark or bright spots (e.g. very bright specular reflections) can strongly influence the average. For example, if a player hides behind a dark tree, metering will result in very low average scene luminance and as a result will overexpose the entire image. Furthermore usually we don’t want to expose for some kind of average lighting condition. Instead we want to expose for the dominant one.

A nice solution here is to use histogram, so we can adapt to some kind of median luminance instead of average. Valve used that approach for HL2: Episode One. They calculated one histogram bin per frame using occlusion queries and built the full histogram on the CPU side [Vla08]. Nowadays we can easily and efficiently build a histogram using a compute shader. More importantly with a compute shader a single texel can influence two nearby bins by some fractional amount. This allows us to cover the entire EV step with just a few log2 space bins. Using just 64 bins we can cover a large range of 16 steps and 128 bins are enough to cover entire range of real world exposures. We could also do a “sliding” histogram, just like we pre-expose the image (multiply shader outputs by adaptation from a previous frame, so we can store HDR data in R11G11B10Float buffers without any precision issues). By the way scene pre-exposure was also introduced by Valve in HL2: Episode One. This way they were able to have a full HDR pipeline using just LDR render targets.

Finally after computing the histogram we skip a large percentage (50%-80%) of the darkest pixels, a smaller percentage (2%-20%) of the brightest pixels and calculate the average from the remaining ones. Metering this way stabilizes automatic exposure and helps to focus exposure on something important.

# Exposure compensation curve

Exposure compensation (or key value) determines whether the exposed image will be relatively dark or bright. Imagine a dark room with closed shutters. After opening shutters, sunlight enters the room and lighting becomes at least a few EV steps brighter. We would expect the final image to also become brighter, but automatic exposure tries to maintain constant final image’s brightness and the image will look almost the same as before opening the shutters. As a rule, we want to have a darker image in low light conditions and a brighter image in high light conditions. This way viewer has a clue as to how bright the lighting is in the current scene. To account for this Krawczyk et al. [KMS05] empirically specified key values for several luminance conditions and fitted a simple curve:

$\text{KeyValue}=1.03\, -\frac{2}{\log _{10}(L+1)+2}$

We can translate that to EV units and plot:

This curve may be a bit too extreme for games, as high key results in a really bright image, but we can just roll our own function or even better – allow artists to tweak the exposure compensation curve directly and store it in a small lookup 1D texture.

# FX and translucents

It’s impossible to balance the luminance of FX (particles, beams, trails…) for various times of a day while using real world enormous ranges of luminance values for lighting. FX artists want their effects to be well visible and have some glow in direct sunlight (~100000 LUX) and at the same time they shouldn’t be overblown in full moon lighting (~0.25 LUX).

Trying to balance single FX brightness for different lighting conditions. Images from [Vai14]

Similarly unlit debug meshes like transparent lines, planes and other editor meshes, should maintain constant brightness on screen despite varying exposure and scene lighting. Some debug meshes could be rendered after the HDR pipeline (after tone mapping), but most have to go through the HDR pipeline in order to get proper translucent sorting. This is important not only for translucent debug meshes, but also for all antialiased opaque debug meshes.

This problem was solved in the game Infamous Second Son by applying manual exposure offset per time of a day [Vai14], but the developers weren’t happy with this solution, as it requires a lot of manual tweaking. A simpler and more robust solution is to negate exposure by dividing color by an estimated exposure. This estimated exposure can be our exposure from a previous frame, can be calculated from a virtual light meter placed at camera position, can be estimated from lightmap values at the center of the screen or can be estimated from light probes at camera position. In any case we usually don’t want to totally negate exposure, but we want to give the artists a slider which blends in log2 space between those two values. This way FX will be darker in low lighting and brighter in high lighting, while still being easily controlled by the artists.

It makes no sense to adapt to a hacked luminance (pixels with a constant brightness on screen) and in some cases it can even introduce a feedback loop. Additionally, some FX like weapon muzzle flashes or explosions shouldn’t influence automatic exposure. As a rule, we want to adapt to something that’s constant on the screen like fire particles, but don’t want to adapt to temporary FX like muzzle flashes or explosions. Debug meshes also shouldn’t influence automatic exposure, so it’s possible to set up automatic exposure in editor or draw debug meshes without changing the final image’s brightness. We could try selectively picking what influences automatic exposure and what not, but it requires storing extra data per pass.

A simple solution to both issues is to compute automatic exposure basing on scene luminance just after the main opaque pass. This way we can skip all translucents, emissive and debug meshes and they won’t influence automatic exposure. Additionally, this fixes feedback loop issues (at least if you don’t want to hack lights). The downside is that it won’t adapt to things like big emissive panels, but we can easily fix it by marking such surfaces in G-buffer or stencil buffer and adding emissive to automatic exposure input only for the marked surfaces.

Automatic exposure works with the final pixel luminance values and ignores the material reflectance (information about how dark materials are). For example, after automatic exposure a wall painted in white will just look just like a wall painted in black. We had this kind of issue in dark corridors, where part of them were covered in snow – either walls were too bright or snow was too dark.

Naty Hoffman in his talk proposed to adapt to illuminance instead of final pixel luminance [Hof13]. This way material reflectance won’t influence automatic exposure – dark corridors will remain dark and snow will be pure white as expected. Additionally it will remove specular from automatic exposure input and further stabilize the automatic exposure.

Most deferred engines have either a separate illuminance (diffuse lighting) buffer or some form of lighting buffer with additional information in alpha channel, which allows to approximately reconstruct plain illuminance. Usually this is motivated by very popular SSSSS algorithm (screen space subsurface scattering) which requires a separate illuminance buffer [JZJ*15].

Having the illuminance, we just need to add skybox and (lit) fog to create the final buffer for automatic exposure calculation. It’s not obvious how to treat skybox. One constant color per skybox? Convert it to illuminance? Just sample skybox texture (luminance)? We settled on luminance, as we want exposure to change depending on camera direction (bright white clouds near sun should have different exposure than darker parts of the skybox at the opposite side). In any case we additionally need a manual exposure compensation for the skybox, so lighting artists are able to manually set the optimal skybox brightness on screen, as the nighttime skybox should be much darker on screen than a daytime one.

Eye reaction to temporal changes in lighting conditions is usually simulated by blending exposures from many frames using the exponential decay function:

$L_{\text{temporal}}=L_{\text{temporal}}+\left(L-L_{\text{temporal}}\right)\left(1-e^{-\Delta \text{time}* \tau }\right)$

In reality, the time of adaptation differs depending on whether we adapt to light or to darkness and on lighting conditions as cones and rods have different adaptation speeds.

Furthermore rod and cones have different characterics and different light sensitivities. For example, when adapting to dark, colored surfaces appear colorless after the rod-cone break. Full light adaptation takes around 5 minutes and full dark adaptation is around 20-30 minutes [Wika]. This lengthy times are the reason why pilots and (possibly) pirates used eye patches [Wikb]. This way they were able to remove the eye patch and instantly see clearly in the dark without having to wait 20 minutes.

For games these are unreasonably long time frames and exact temporal adaptation details aren’t important. Maybe in titles like Metal Gear Solid it would be interesting to use an eye patch for instant dark adaptation or speedup dark adaptation by eating rich in vitamin A foods (e.g. carrot or fish). For most games interesting takeaway here is to differentiate speeds of light adaptation and dark adaptation.

# What’s next?

What are other possible ideas we could try? One very appealing idea is to adapt to a single dominant lighting condition. Possible implementation would be to bucket lights by tags, pick the most popular bin and use it for automatic exposure.

We could also build a RGB histogram and use it for automatic chrominance adaptation which allows the human visual system to adapt to lighting of a particular color (in photography it’s called automatic white balance [Wro15]).

Many cameras feature a multi-zone metering modes for tracking moving subjects – usually for sports photography. The idea here is to track a moving subject and try to expose based on its luminance. In games we have more information and we could extend this approach. We could expose based on important objects in a scene like enemies, player in third person view or predefined objects.

Using automatic exposure more complex operations are possible than with manual exposure. For example, we could expose different parts of the image differently based on Ansel Adam’s Zone System [YS12]. This approach could simulate real movie lighting pipeline or advanced photo processing, where often HDR lighting is compressed using various tricks like placing additional lights in the dark interiors or placing neutral density gels on the windows. This would fix gameplay issues caused by too wide range of luminance values. For example, when a player being indoors can’t see enemies outside in sunlight, because that part of image is overexposed.

Finally I’d like to thank Bartłomiej Wroński for an interesting discussion about automatic exposure.

# References

[NP15] David Neubelt, Matt Pettineo – “Advanced Lighting R&D at Ready At Dawn Studios”, SIGGRAPH 2015
[Ree14] Nathan Reed – “Artist-Friendly HDR With Exposure Values”, 2014
[LR14] Sébastien Lagarde, Charles de Rousiers – “Moving Frostbite to Physically Based Rendering 2.0”, SIGGRAPH 2014
[Hen14] Padraic Hennessy – “Implementing a Physically Based Camera: Automatic Exposure”, 2014
[Pet11] Matt Pettineo – “Average Luminance Calculation Using A Compute Shader”, 2011
[Vla08] Alex Vlachos – “Post Processing in The Orange Box”, GDC2008
[KMS05] Grzegorz Krawczyk, Karol Myszkowski, Hans-Peter Seidel – “Perceptual Effects in Real-time Tone Mapping”, SCCG 2005
[Vai14] Matt Vainio, “The Visual Effects of Infamous: Second Son”, GDC 2014
[Hof13] Naty Hoffman – “Outside the Echo Chamber: Learning from Other Disciplines”, i3D 2013
[JZJ*15] Jorge Jimenez, Károly Zsolnai, Adrian Jarabo, Christian Freude, Thomas Auzinger, Xian-Chun Wu, Javier von der Pahlen, Michael Wimmer and Diego Gutierrez – “Separable Subsurface Scattering”, CGF 2015