Anomaly 2 mobile goal was to try to reach PC version quality and reuse as many assets as possible. PC version had tons of dynamic lights, dynamic shadows, SSAO and similar effects. There was no way to run it on mobile directly. Programming graphics for mobile feels like going 5-10 years back in time. Comparing to PC or consoles mobile GPUs are slow, games are rendered in high resolutions and there is a solid amount of memory available. A perfect fit for some kind of precomputed lighting.
Anomaly 1 mobile version had simple lightmaps, which I hacked in a few hours. Baking was done in two steps. First step rendered PC real-time directional light with shadows. Second step added ambient occlusion by rendering manually placed darkening quads. Lightmap was applied only to the terrain, which consisted of a single textured 2D plane.
For Anomaly 2 mobile we decided to write a proper baked lighting solution. Unfortunately it had to be based on DX9. At that time our game editor was based on DX9 as we had no time for implementing DX11 support. DX11 enables new possibilities for optimizations. Namely compute shaders and smaller draw call overhead. Both are very important, as GPU lightmapper is often bottlenecked by CPU and some operations aren’t a good fit for pixel shaders.
There are many lightmap format flavors – plain accumulated diffuse lighting, directional normal maps (radiosity normal maps), spherical harmonics or dominant light direction (ambient and directional light per texel). For a good overview check out presentation by Illuminate Labs (Beast creators) [Lar10].
Graphics artists wanted high lightmap density for sharp shadows and detailed lighting. Additionally changing difficulty levels changed objects placement, so some parts of lightmap needed to be stored multiple times per difficulty level. There was no way we could use any format other than accumulated diffuse because of memory requirements. Storing accumulated diffuse is also the fastest method and every millisecond counts on mobile. Unfortunately selected format doesn’t support normal mapping, so I had to resort to a hack to add normal maps.
For baking I went with a classical approach. Render a hemicube from a POV of every lightmap texel to gather irradiance at that point and later integrate it to compute radiance [Eli00]. Similar to solution used in The Witness, which is described by Ignacio Castaño in a series of great posts on their blog [Cas10] [Cas10].
First step is to generate some unique UV for every static object in the game which has lightmaps. It’s important to place seams manually in places where lighting is discontinuous. Those seems will prevent lighting from leaking across hard edges. Theoretically hard edges information should be read from mesh source file. For our case it was easier. XSI by default creates hard edges for angles greater than 60 degrees, so it was enough to place a seam in places where angle between normals was higher than mentioned value.
For chart UV generation D3DX UVAtlas was used. It’s based on research made 10 years ago (“Iso-Charts” [ZSGS04] and “Signal Parametrization” [SGSH02]), so actually you can find better algorithms nowadays. Especially interesting are surface quadrangulation algorithms like [BZK09] or [ZHLB10], which are compatible with “invisible seam” UV algorithm [RNLL10]. Pretty intense stuff, but fixes seam issues forever.
Chart UV are generated so it has appropriate density (texel per square meter). I used two metrics – average density around target and min density no less than 90% of target.
Level creation pipeline was heavily based on prefabs (object instances). There was no point in using methods like signal based parametrization [SGSH02]. Theoretically It could be used in order to place more detail on roofs of buildings and less on the sides. In practice there was no way to find how that building was used. Every instance could have custom rotation and non-uniform scale.
Most charts have rectangular shapes, so first all charts are rotated to their best fit rectangle. Then charts are sorted by area and max side. Finally charts are packed using a brute force chart packer. It introduces one chart at time, testing all possible locations and all combinations of multiple of 90 angle rotations and mirror transforms. Best location is chosen using extent metrics. There was one additional constraint in order to minimize texture compression artifacts – max 3 charts per 4×4 texel block.
Most papers use tetris like packing schemes. Nowadays there is plenty power for a bruteforce solution, which achieves much better pack ratios. For faster packing even GPU could be used:
1. Rasterize all possible chart combinations or blit prerasterized sprites using additive blending.
2. Test for chart overlap and other constraint violations using pixel or compute shader.
For us an optimized CPU packer was fast enough, as we were packing charts only once per prefab (object template) during mesh import. When packing charts for an entire level into one atlas a GPU approach can help a lot. In that case there is one additional trick to achieve better pack ratios. That is to leverage GPU wrap texture address mode and allow charts to wrap around the borders of the atlas [NS11].
Most our levels were city landscapes with a lot of hard edges. This means a lot of small UV charts which require a lot of padding. I have used some tricks to reduce those borders:
1. 1×1 texel charts were collapsed and snapped to texel center.
2. 1xN / Nx1 texel charts were similarly collapsed to a line and snapped to texel centers.
3. NxN charts were resized and aligned to texel centers.
After placing objects on the level their lightmaps were packed into multiple 2kx2k atlases. Classic binpacking using a binary tree was used [Sco03]. First all objects(rectangles) were sorted by their area and max side. Then objects were inserted into binary tree one at a time. If couldn’t insert into first 2kx2k atlas, then tried to insert into second etc. Every object was surrounded by a proper border (1 texel width). Additionally every object was resized and aligned to full 4×4 texel block in order to minimize texture compression artifacts.
After packing unique scale and bias lightmap UV parameters were assigned to every instance. At runtime scale and bias were applied to lightmap UV in vertex shader. It allowed to reuse lightmap UV per object, so all mesh instances could share the same vertex buffer. Additionally every mesh has unique UV mapping, which can be used for other purposes.
Direct lighting baking
In order to bake lightmap we first need to calculate appropriate world position and normal per lightmap texel. Most straightforward way is to rasterize the geometry in the lightmap UV space, writing position and normal. It’s best to use conservative rasterization with analytical antialiasing. Conservative rasterization ensures that every texel covered by geometry will be included. Not only those where texel center is covered by geometry. Analytical antialiasing ensures that the best centroid will be taken.
Actual rasterization was done using half-space rasterization with a bit tweaked fill rules and line equations in order to make rasterization conservative. For each rasterized texel, source triangle was clipped to that texel’s bounding box and its area and centroid was calculated. In a case of multiple overlapping triangles per texel the one with highest texel area coverage was picked. Finally centroid was used to calculate world position and normal. Very similar rasterization implementation can be found in NVIDIA mesh processing tool sources.
Direct lighting baking was done by looping through lights and drawing quads which covered affected objects in lightmap. Almost the same pixel shader was used as for real time lights used in PC version, just with disabled specular etc. In order to sample shadows, objects were batched according to location, shadow map was focused on that batch and entire batch was rendered. This step outputs a set of FP16 lightmaps with accumulated direct diffuse lighting.
Indirect lighting baking
To bake indirect (bounced) lighting and ambient occlusion hemicubes were used. Apart from already calculated world position and normal, hemicube rendering requires some arbitrary “up direction”. It can be either constant direction resulting in banding or a random one – resulting in noise. For us random “up direction” worked best. Just needed to make sure that random seed is dependent on world position.Without it some lightmaps which were generated per difficulty level wouldn’t fit with the rest of lightmaps, which were generated once.
Now we have, per lightmap texel, a hemicube position (world position), hemicube direction (world normal) and hemicube up direction. Straightforward hemicube approach requires rendering scene to 5 different viewports per every lightmap texel. For better performance I decided to use a single plane with a ~126.87 degrees FOV (double side length of image plane compared to hemicube rending). In our case it looked almost as good, but turned out to be a few times faster. There is also a question how to treat missing samples. According to [Gau08] the best approach is to replicate edge texels – increase weights on edges. In our case simple weight renormalization looked best.
When baking sometimes camera’s near plane intersects with nearby surfaces. A simple solution would be to move near plane closer to the eye until that overlap disappears. Unfortunately moving that plane too much introduces depth buffer artifacts. To fix it pixel shader tests for back faces using VFACE (SV_IsFrontFace). When back faces are encountered ambient occlusion and bounced light is set to zero. Theoretically this is not always correct, but in practice looked good and fixed all visual issues.
Finally, gathered radiance needs to be converted to irradiance. It requires weighting samples by cosinus factor and solid angle. Texels located at cube map corners have smaller solid angle and should influence the result less. Solid angle can be calculated by projecting texel onto unit sphere and calculating its area on the sphere [Dri12].
I’ve precomputed weights and stored them in a lookup texture, so they could be directly used by pixel shader:
float ElementArea( float x, float y )
return atan2f( x * y, sqrtf( x * x + y * y + 1.0f ) );
// calculate weights
float weightSum = 0.0f;
for ( unsigned y = 0; y < IL_PROBE_SIZE; ++y )
for ( unsigned x = 0; x < IL_PROBE_SIZE; ++x )
float const tu = ( 4.0f * ( x + 0.5f ) / IL_PROBE_SIZE ) - 2.0f;
float const tv = ( 4.0f * ( y + 0.5f ) / IL_PROBE_SIZE ) - 2.0f;
// cosFactor = |v1( tu, tv, 1 )| DOT v2( 0, 0, 1 ) = |v1|.z
float const cosFactor = 1.0f / sqrtf( tu * tu + tv * tv + 1.0f );
// solid angle projection
float const texelStep = 2.0f / IL_PROBE_SIZE;
float const x0 = tu - texelStep;
float const y0 = tv - texelStep;
float const x1 = tu + texelStep;
float const y1 = tv + texelStep;
float const solidAngle = ElementArea( x0, y0 ) - ElementArea( x0, y1 ) - ElementArea( x1, y0 ) + ElementArea( x1, y1 );
float const weight = cosFactor * solidAngle;
weightSum += weight;
ILProbeWeights[ x + y * IL_PROBE_SIZE ] = weight;
// normalize weights
for ( unsigned y = 0; y < IL_PROBE_SIZE; ++y )
for ( unsigned x = 0; x < IL_PROBE_SIZE; ++x )
ILProbeWeights[ x + y * IL_PROBE_SIZE ] /= weightSum;
For achieving best performance all mentioned steps were done exclusively on the GPU. Only final lightmaps were copied to main memory in order to store them on disk.
1. Render multiple IL probes
2. Integrate using precomputed cubemap and downscale to 1×1 texel (watch out for F16 precision issues)
3. copy texel to final position in lightmap
Last steps were batched in order to optimally use GPU.
For fast preview mode and generally for faster baking irradiance caching [Cas11] [Cas14] [Dri09] could be used. It works by smartly picking sample positions and filling missing places using interpolation. Sample placement is estimated by analyzing calculated radiance in existing sample locations. It leads to workloads, which are much harder to batch and are less GPU friendly than bruteforce approach. It’s especially bad, when you can’t use compute shaders. Due to this and due to time constraints I didn’t implement it, but results from other people look very promising. It’s something I’d like to implement in future if I ever write another baker.
Terrain consisted mostly of multiple flat tiles and decals placed on top of them. Tile were small and wasted a lot of lightmap space on padding. Decals didn’t reuse lightmap and lighting values were needlessly duplicated. Borders between those tiles were also problematic because of seam artifacts created by UV discontinuity and aggressive PVR compression. In order to solve those issues a special terrain lightmap was introduced. Basically it was a big 2D plane, placed on a specific height, for which lighting was baked. Artist could mark which objects and decals should use this lightmap instead of having it’s own. Artist also used this lightmap for small 3d objects on terrain (eg. small debris). This solved terrain seam issues and resulted in more efficient lightmap texture usage.
Lightmap real-time composition
Baking results were stored as two FP16 textures with linear values. First texture contained direct lighting, second – bounced lighting and ambient occlusion. All inputs could be mixed in real-time in editor, just like layers in Photoshop. Everything was controlled by curves. Artists could tweak ambient occlusion strength, colorize lighting, increase bounced lighting strength etc. Everything was handled by a single pixel shader and was really fast. Not a physically correct approach, but it enabled fast iterations for final light tweaks, without requiring lengthy lighting rebake.
From a technical side this step merged high precision lighting components and outputted a single lightmap in RGBX8_SRGB format. Lighting values were rescaled to [0;2] range for some extra lighting range. It’s possible to dynamically select lighting range per object for better encoding. However in our case lightmap textures were low precision and were heavily compressed (PVR 2bpp), so it would result in visible discontinuities between two objects with different lighting scale.
Apart from composition this step also fills unused lightmap texels and tries to weld lightmap UV discontinuities. Parts of mesh that are connected in 3D space can be disjoint in lightmap UV space. Interpolation along this disjoint edge causes visible seams. Invisible seam algorithm [RNLL10] could be used to fix it. Unfortunately invisible seam algorithm requires complicated quadrangulation algorithms like [BZK09] or [ZHLB10]. It also imposes additional restrictions on UV charts, so UV space usage is less efficient. Compression is an another source of seams along those discontinuities. Compression seam removal requires introducing additional constraints and reduces UV space usage efficiency even more.
This was an overkill for us, as there are approximate methods, which work with any UV parametrization. Those methods search for a “best” border texel value. Either by evaluating a few points and using least squares [Iwa13] or trying to solve analytically bilinear filtering equation [Yan06].
I went with a simple and rough solution – average values across seams during lightmap composition pass. Additionally in order to reduce texture compression artifacts final composition pass was to flood fill lightmap values, so unused texels would have similar values as neighbors.
Iteration times are very important for graphics artists. That’s why I added selective baking and minimal rebake. Selective baking allows to bake only selected objects. Minimal rebake allows to bake only modified objects and their appropriate surroundings. Over night all levels were automatically rebaked and changes pushed to SVN. So usually only a small part of scene needed to be rebaked during normal workflow and iteration times were manageable.
Static object lighting
Static object lighting was based on baked diffuse in lightmaps. For better quality and lighting resolution normal maps were used. Due to memory constraints I couldn’t do proper normal mapping with lightmaps and had resort to a hack. Normals were combined with dominant light’s direction (usually sun) and were used for perturbing lightmap values:
float diffuseMult = saturate( dot( normalTS, lightDirTS ) ) * FLDScale + FLDBias;
float3 diffuse = diffuse * diffuseMult;
This hack worked out quite nice in practice adding extra detail and helping to reduce compression artifacts. This was very important as we were using hardcore PVR 2bpp compression. Normal maps were also used for real time specular (calculated for a single light) and for envmaps.
Dynamic object lighting
Dynamic objects were primary lit using diffuse stored in light probes (captured irradiance for all possible orientations at a single point of space). Specular and normal maps were added just like in case of static object’s.
There are many ways of implementing light probes. Again because of performance I had to choose the simplest method – “Valve ambient cube” [MMG06]. Which is almost like a 1×1 texel face cubemap, which stores lighting values per face direction. For general usage it has a lot issues – for example it’s not rotationally invariant, so lighting error depends on light’s angle. For our case it was a good fit. Game had almost always top down view and those basis allowed to harness that fact. The top cube face was the most important one.
A single ambient cube was stored as a single chrominance value with 6 intensity values per each face. This allowed to drive down light probe memory usage and reduce computations a bit. In our case the top face was the most important, so it had highest weight and bottom face had the lowest weight when computing a single chrominance value for captured lighting.
Light probes were stored as a few layers of dense 2D grids. There are much better schemes like tetrahedral tessellation [Cup12]. In our case 4 regular 2D grids (4 height layers) were enough, so memory wasn’t here an issue. Besides regular grid speeds up light probe lookup and simplifies debugging. With grid I could just save results as a 2D texture for simple visualization.
At runtime a single light probe per object was calculated. It was done by selecting one cell in grid (cube consisting of 8 probes in the corners) and using trilinear interpolation to compute light probe value at object’s center.
When light probes are arranged in a regular grid, some of them are placed inside of geometry. The result is that dynamic objects are too dark near obstacles. To solve this issue, probes for which most of gathered environment consists of back faces were marked as invalid. At runtime those invalid probes were discarded and were excluded from interpolation by setting their weights to zero and renormalizing other weights.
Single light probe per entire object lighting is only correct for the center of that object, so no shadow transitions are possible on its surface. In order to fix it irradiance gradients [Tat05] were used. Again used the top down view to reduce computations, so irradiance gradient was computed and applied only for top face. For fast evaluation a simple linear gradient was used. Gradient computation was quite straightforward. Per axis two additional light probes were calculated (located at center -/+ 70% of bbox half extent). Then gradient value which minimizes error was taken. At runtime, per vertex, a position offset was calculated and multiplied by that value:
probeL.x = LightProbeLuminance[ ( nrmWS.x >= 0.0 ? 1 : 0 ) + 0 ];
probeL.y = LightProbeLuminance[ ( nrmWS.y >= 0.0 ? 1 : 0 ) + 2 ];
probeL.z = LightProbeLuminance[ ( nrmWS.z >= 0.0 ? 1 : 0 ) + 4 ];
// apply irradiance gradient to top face
float3 offsetWS = posWS - LightProbeCenterWS;
probeL.y += offsetWS.xxx * LightProbeGradientX;
probeL.y += offsetWS.yyy * LightProbeGradientY;
probeL.y += offsetWS.zzz * LightProbeGradientZ;
float3 sqNormalWS = normalWS * normalWS;
float3 diffuse = dot( sqNormalWS, probeL ) * LightProbeRGB;
Of course it doesn’t look no way as good as real shadows. On the other hand it enabled some shadow transitions at a cost of a few vertex shader instructions.
I tried to render more than 6 directions per ambient cube and then compute final probe values using least squares, but it didn’t noticeably improve quality.
When baking light probes dynamic objects can’t be included, so they can’t occlude sun or other light coming from above. Resulting lighting at the bottom of light probe is too strong and objects placed near terrain tend to have unnatural bright lighting from below. To fix it ambient cube’s bottom face was darkened a bit.
There were many redundant light probes, so for storage a simple dictionary coder was used. Per grid layer a dictionary of light probes was maintained. Light probes were stored in grid as indices to specific entry in the dictionary. For extra compression a kd-tree could be used for storing those indices [Ani09], but in our case dictionary coder was enough to reach storage requirements.
Dynamic object shadows
Dynamic shadows are always a big challenge when using baked lighting. Moreover in our case rendering budget was very tight. In multiplayer matches players could place really a lot of dynamic objects, so there was no way we could use shadow maps.
For Anomaly 1 mobile Bartosz Brzostek build an interesting system of dynamic shadows for units. Every dynamic object had prebaked projective shadows. In other words shadow sprites were projected on the terrain. Those sprites were attached to selected attachment points (bones). I extended this system with new lightmap support and reused it for Anomaly 2 mobile. For example a tank has one sprite for chassis and one sprite for turret.
Shadows were gathered in screen space using a small offscreen (x16 times downscaled) buffer. First this buffer was cleared to white. Later, shadow sprites were projected on a 2D ground plane and rendered using min blend mode to prevent double shadowing artifacts. Finally this offscreen target was combined with lightmap during normal rendering pass. This constrained shadow receivers only to flat terrain. Additionally combining lightmap, which accumulates lighting from multiple shadowing light sources, with dynamic shadows is wrong on many levels. On the other hand it allowed cheap multiple dynamic shadows.
It enabled one additional cool trick – artist could prebake shadow penumbra. Parts near the ground had dark and sharp shadows and parts placed high were brighter and more blurred.
Lightmapper allowed graphics artists to move heavy content from the PC versions. It also turned out quite fast. 20-30 min for full quality level rebake on standard PC. In order to reach this level of performance I’ve written a special lightweight render path just for baking. Still baking was mostly bottlenecked by draw calls (driver time).
Finally I’d like to thank the entire team at 11 Bit Studios for making this game, Wojciech Sterna for proofreading and Michał Iwanicki for an interesting discussion about lightmaps.
[Lar10] David Larsson – “The Devil is in the Details: Nuances of Light Mapping”, Gamefest 2010
[ZSGS04] – K. Zhou, J. Snyder, B. Guo, H-Y Shum – “Iso-charts: Stretch-driven Mesh Parameterization using Spectral Analysis”, Eurographics 2004
[SGSH02] – P.V. Sander, S.J. Gortler J. Snyder, H. Hoppe – “Signal-Specialized Parametrization”, Eurographics 2002
[RNLL10] N. Ray, V. Nivoliers, S. Lefebvre, B. Lévy – “Invisible Seams”, Eurographics 2010
[BZK09] D.Bommes H.Zimmer L.Kobbelt – “Mixed-Integer Quadrangulation”, Siggraph 2009
[ZHLB10] M. Zhang, J. Huang, X. Liu, H. Bao – “A Wave-based Anisotropic Quadrangulation Method”, Siggraph 2010
[Sco03] Jim Scott – “Packing Lightmaps”, 2003
[Dri12] Rory Driscoll – “Cubemap Texel Solid Angle”, 2012
[Yan06] Yann L –“Radiosity on curved surfaces?”, GameDev.net forum post 2006
[Iwa13] Michał Iwanicki – “Lighting Technology Of “The Last Of Us”, Siggraph 2013
[NS11] T.Nöll, D.Stricker – “Efficient Packing of Arbitrary Shaped Charts for Automatic Texture Atlas Generation“, Eurographics 2011
[Eli00] Hugo Elias – “Radiosity”, 2000
[Cas10] Ignacio Castaño – “Hemicube Rendering and Integration”, 2010
[Cas10] Ignacio Castaño – “Lightmap Parameterization”, 2010
[Cas11] Ignacio Castaño – “Irradiance Caching – Part 1”, 2011
[Cas14] Ignacio Castaño – “Irradiance Caching – Continued”, 2014
[Dri09] Rory Driscoll – Irradiance Caching: Part 1, 2009
[Gau08] Pascal Gautron – “Practical Global Illumination With Irradiance Caching”, Siggraph 2008 class notes
[Cup12] Robert Cupisz – “Light probe interpolation using tetrahedral tessellations”, GDC 2012
[MMG06] J. Mitchell, G. McTaggart, C. Green – “Shading in Valve’s Source Engine”, Siggraph 2006
[Tat05] – Natalya Tatarchuk – “Irradiance Volumes for Games”, GDC 2005
[Ani09] S. Anichini – “A Production Irradiance Volume Implementation Described”, 2009