GDC 2018 Presentations

(last update: March 23, 2018)

Programming Track

Posted in Conference | Leave a comment

Cloth Shading

Over the holiday break I had some time to play with interesting ideas presented during the last SIGGRAPH. One thing which caught my attention was new analytical cloth BRDF from Sony Pictures Imageworks [EK17], which they use in movie production.


Current state of the art of cloth shading in games still seems to be Ashikhmin velvet BRDF [AS07], which was popularized in games by Ready at Dawn [NP13]. It basically boils down to skipping geometry term, replacing traditional microfacet BRDF denominator by a smoother version and using an inverted Gaussian for the distribution term:

D={\frac{1}{\pi(1+4 \alpha ^2)}}(1+\frac{4 \exp(-\frac{\cot^2\theta}{\alpha ^2})}{\sin ^4\theta})

Full shader code (microfacet BRDF denominator and geometry term is included in V term):

float AshikhminD(float roughness, float ndoth)
    float m2    = roughness * roughness;
    float cos2h = ndoth * ndoth;
    float sin2h = 1. - cos2h;
    float sin4h = sin2h * sin2h;
    return (sin4h + 4. * exp(-cos2h / (sin2h * m2))) / (PI * (1. + 4. * m2) * sin4h);

float AshikhminV(float ndotv, float ndotl)
    return 1. / (4. * (ndotl + ndotv - ndotl * ndotv));

vec3 specular = lightColor * f * d * v * PI * ndotl;


Imageworks’ presentation proposes a new cloth distribution term, which they call “Charlie” sheen:

D=\frac{\left(2+\frac{1}{\alpha }\right)\sin^{\alpha}\theta}{2\pi}

This term has more intuitive behavior with changing roughness and solves the issue of harsh transitions (near ndotl = 1) of Ashikhnim velvet BRDF:


Left: Ashikhmin Right: Charlie

Although Charlie distribution term is simpler than Ashikhmin’s, Imageworks’ approximation for the physically based height correlated Smith geometry term is quite heavy for real-time rendering. Nevertheless, we can just use CharlieD and follow the same process as in [AS07] for the geometry term and BRDF denominator:

float CharlieD(float roughness, float ndoth)
    float invR = 1. / roughness;
    float cos2h = ndoth * ndoth;
    float sin2h = 1. - cos2h;
    return (2. + invR) * pow(sin2h, invR * .5) / (2. * PI);

float AshikhminV(float ndotv, float ndotl)
    return 1. / (4. * (ndotl + ndotv - ndotl * ndotv));

vec3 specular = lightColor * f * d * v * PI * ndotl;

This results in a bit better looking, more intuitive to tweak and faster replacement of standard Ashikhmin velvet BRDF. See this Shadertoy for an interactive sample with full source code.


[NP13] David Neubelt, Matt Pettineo – “Crafting a Next-Gen Material Pipeline for The Order: 1886”, SIGGRAPH 2013
[AS07] Michael Ashikhmin, Simon Premoze – “Distribution-based BRDFs”, 2007
[EK17] Alejandro Conty Estevez, Christopher Kulla – “Production Friendly Microfacet Sheen BRDF”, SIGGRAPH 2017

Posted in Graphics, Lighting | Leave a comment

Digital Dragons 2017

A few days ago I had a chance to attend and speak at Digital Dragons 2017 about rendering in Shadow Warrior 2. It was a total blast – very pro organized, had a honor to meet some incredible people and listen to some very inspiring talks. Anyway, if you are interested in the presentation (with notes), you can download it here – “Rendering of Shadow Warrior 2”.

Posted in Conference, Graphics, Lighting, Post Processing | 7 Comments

Job System and ParallelFor

Some time ago, while profiling our game, I noticed that we have a lot of thread locking and contention resulting from a single mutexed MPMC job queue processing a large amount of tiny jobs. It wasn’t possible to merge work into larger jobs, as it would result in bad scheduling. Obviously, the more fine grained work items there are, the better they schedule.

There are two standard solutions: either make the global MPMC queue lock-free or use job stealing.

Global lock-free MPMC queue is quite complex to implement and still has a lot of contention when processing a large amount of small jobs. Maciej Siniło has a great post about a lock-free MPMC implementations if you are looking for one.

Job stealing replaces a single global MPMC queue with multiple lock-free local MPMC queues (one per a job thread). Jobs are pushed to multiple queues (static scheduling). Every job thread processes its own local queue and if there are no jobs left then it tries to steal a job from the end of a random queue (check out this post for an in-depth description). Job stealing has its own issues – it messes up the order of job processing or in other words it trades lower latency for a higher throughput. Moreover, if static scheduling fails (e.g. jobs have widely different lengths), then job stealing can degrade to a global MPMC queue with a lot of contention.

Before going nuclear with a lock-free MPMC queue or before implementing job stealing it may be interesting to consider some alternatives. I learned to avoid complex generic solutions and instead to favor specialized, but simpler ones. Maybe the specialized solutions won’t be better in the end, but at least it will be easier for the future code maintainer to make some changes or rewrites.

Going back to my profiling investigation, the interesting part was that almost all of those jobs were effectively doing a simple parallel for – spawning a lot of jobs of the same type in order to process the entire array of work items. For example: test visibility of 50k bounding boxes, simulate 100 particle emitters etc. This gave me the idea to abstract job system specifically for this case – a single function, array of elements to process in parallel and shared job configuration (dependencies, priorities, affinities etc.).

The implementation is simple. First we need a ParallelForJob structure (just remember to add some padding to this structure in order to avoid false sharing).

struct ParallelForJob
    uint pushNum;
    uint popNum;
    uint completedNum;

    uint elemBatchSize;
    uint nextArrayElem;
    uint arraySize;

    func* function;

In order to add a new work item, we just push a single job to the global, protected by a mutex, MPMC queue. Contention isn’t an issue here, because the number of jobs going through this global queue is low.

uint reqBatchNum = ( arraySize + elemBatchSize - 1 ) / elemBatchSize;
uint reqPushNum = ( reqBatchNum + JOB_THREAD_NUM - 1 ) / JOB_THREAD_NUM;
uint pushNum = Min( reqPushNum, JOB_THREAD_NUM );

ParallelForJob job;
job.pushNum = pushNum;
job.popNum = 0;
job.completedNum = 0;
job.elemBatchSize = elemBatchSize;
job.nextArrayElem = 0;
job.arraySize = arraySize;

jobQueue.push( job );

jobThreadSemaphore.Release( pushNum );

After releasing the job thread semaphore, waked job threads pick the next ParallelForJob from the global queue.


jobQueue.peek( job );
if ( job.popNum.AtomicAdd( 1 ) + 1 == job.pushNum )

Next, job thread starts to process array elements of the picked job. Array elements make a fixed size queue without any producers, so a simple atomic increment is enough to safely pick the next batch of array elements from the multiple job threads in parallel.

while ( true )
    uint fromElem = job.nextArrayElem.AtomicAdd( job.elemBatchSize );
    uint toElem = Min( fromElem + job.elemBatchSize, job.arraySize );
    for ( uint i = fromElem; i < toElem; ++i )
       job.function( i );

    if ( toElem >= job.arraySize )

Finally, the last job thread runs an optional cleanup or dependency code.

if ( job.completedNum.AtomicAdd( 1 ) + 1 == job.pushNum )
    OnJobFinished( job );

Recently, I found out that Arseny Kapoulkine implemented something similar, but with an extra thread wait for the other threads to finish at the end of ParallelForJob processing loop. Still IMO it’s not a widely know approach and it’s worth sharing.

The interesting part about ParallelForJob is that it allows to pause and resume a job without using fibers (just store current array index) and allows to easily cancel a job in flight (just override current array index). Furthermore, this abstraction can be also applied to the jobs themselves. Just replace an array of elements with an array of jobs (instead of an array elements you commit and process an array of jobs).

Posted in C++, Multithreading | 1 Comment

GDC 2017 Presentations

(last update: April 5, 2017)

Programming Track

AMD Capsaicin And Cream Event

Math For Game Programmers

AI Summit

Visual Arts


Business & Marketing

Game Narrative Summit



Posted in Conference | 6 Comments

HDR Display – First Steps

Recently NVIDIA send us a nice HDR TV and we got a chance to checkout this new HDR display stuff in practice. It was a rather a fast implementation, as our game is shipping in less than 2 months from now. Regardless, results are good and it was definitely worth to identify issues and make preparations for full HDR display support. In future, we will be revisiting HDR display implementation, but first we need HDR monitors to become available (current HDR TVs are simply too big for a normal work), so we can think about using increased brightness and color gamut and in our art pipeline.

Tone Mapping

We want to output scRGB values (linear values, RGB primaries, 1.0 maps to 80 nits and ~12.5 maps to 1000 nits). Just like for the LDR display I just fitted ACES RRT+scRGB (1000 nits) to a simple analytical curve. Currently there is no HDR TV which supports more than 1000 nits, so there was no point in supporting anything else. 

float3 ACESFilmRec2020( float3 x )
	float a = 15.8f;
	float b = 2.12f;
	float c = 1.2f;
	float d = 5.92f;
	float e = 1.9f;
	return ( x * ( a * x + b ) ) / ( x * ( c * x + d ) + e );

Just like in case of LDR curve, this curve is shifted a bit and in order to get a reference curve just multiply input x by 0.6. Curve isn’t precise at the end of the range, but it isn’t very important in practice:



First issue with UI is that 1.0 in HDR render target maps to around 80 nits, which is looks too dark compared to the image on a LDR display. Solution was very simple – just multiply UI output by a magic constant :). Second issue with UI is that alpha blending with very bright pixels causes artifacts. In order to fix that we needed to draw UI to a separate render target and do a custom blend it with the rest of the scene in a separate pass.

Color Grading

Color grading was the only rendering pass, which used scene colors after tone mapping. Obviously, having two different curves (one for LDR display and one for the HDR display) breaks consistency of this pass. I looked through our color grading settings and managed to simplify it to a simple analytic system – shadow / highlight tint with some extra settings. Redoing color grading at this stage of the project was out of the question, so all old color grading settings were automatically fitted using least-squares. For the next project we plan grading in some different space with more bits and log like curve (ACEScc or Sony S-Log).


Some things in our game look awesome in HDR display, but some don’t look so good. Most issues are caused by “artistic” lighting setups, which were carefully tuned for the LDR tone mapping curve. E.g. in some places sunlight is nicely “burned” in when viewed on LDR display, but on the HDR display looks washed out, as lighting isn’t bright enough. Unfortunately, this is something that can’t be fixed last minute and something to think about when we will be creating content for the next game.


Current HDR displays don’t have amazing brightness. 1000 nits (current HDR displays) vs 300 nits (current LDR displays) isn’t that big difference, as perceived brightness is square of luminance. On the other hand HDR displays add a lot of additional details – pixels which were grey, because of the tone mapping curve, now get a lot of color. Anyway, we are moving forwards here and there is no excuse not to support HDR displays.

Digging Deeper

Posted in Graphics | 2 Comments

GDC 2016 Presentations

(last update: May 20, 2016)

This year’s GDC was awesome. Some amazing presentations and again I could chat with super-smart and inspiring people. Be sure to check out “Advanced Techniques and Optimization of HDR Color Pipelines”, “Optimizing the Graphics Pipeline with Compute” and “Photogrammetry and Star Wars Battlefront” . Growing list of presentations:

Programming Track

Math for Game Programmers

Visual Arts Track

Production Track

Design Track

Game VR/AR Track

Business Track

AI Summit

Game Narrative Sumit

Independent Games Summit

Presentation Coverage

Khronos Session

GDC Vault


Posted in Conference | 2 Comments