Particle data structure

Particles are usually some kind of simple structs stored in an array. Then every frame vertex buffer is filled using that array. You really don’t want to sort them (at least not all particles and sometimes sorted particles don’t look good). You need also to retain insertion order to prevent particle popping. Using simple array and replacing deleted element with the last one isn’t a good idea.

Straightforward solution is to use circular buffer. This has some issues – if particles have no uniform lifetime it needs some checks in order to detect dead particles when filling the vertex buffer. There will be also some holes in the circular buffer, so memory won’t be used effectively.

Another solution is to use array and defragment it every frame. This means no conditional statements, but could lead to a lot of memory copying (depending on the lifetime of particles).

The interesting thing is that above methods can be combined. Third method uses an assumption that in case of the defragmented array You usually remove particles from the beginning and add to the end. So why not just have large array and two floating pointers for marking begin and end of particle data. Now when You add something – increase end pointer. When You remove an element – increase begin pointer. In case of removal from the middle of data – defragment by shifting data from left or right. When end pointer can’t be increased just defragment by copying data to the beginning of the array. This doesn’t change the worst case – removal from the array middle, but should greatly speed up the average case.

Posted in Graphics | 3 Comments

Premultiplied alpha

If You don’t know what’s premultiplied alpha and why it solves all the world’s problems, just read Tom Forsyth on premultiplied alpha or Shawn Hargreaves on premultiplied alpha.

Pros:

  1. Better quality for textures with sharp alpha cutouts.
  2. Removes some blending state changes (which can be important if You are coding for a PC).
  3. Can mix additive blending with alpha blending without a state/shader/texture change.

What are the hidden cons, which aren’t mentioned by Tom or Shawn?

  1. Fixed pipeline fog doesn’t work with it.
  2. Worse quality for smooth alpha with DXT5 compression.

Why premultiplied alpha doesn’t work very well with DXT5 for textures with smooth alpha gradients? In DXT5 texture is divided into the 4×4 pixel blocks. For every block, colors (and alphas) are approximated with equidistant points on a line between two end points (using some index table). Color end points are quantized (5:6:5 bits) and alpha end points are saved with 8 bit precision. This means that for the most textures we get better precision for the alpha channel than for the color channel.

Furthermore compressing values of a broader range gives us better precision. For example if we have alpha filled with 1/255 and RGB values in range [0; 1] then premultiplied texture RGB channels will contain only two different numbers – 0 or 1/125. This means that by using standard alpha blending we get better precision in case we would like to multiply RGB by a factor greater than 1 or tonemap final results.

Standard alpha blending:

srcAlpha = Compress_color( src.rgb ) * Compress_alpha( src.a )

Premultiplied alpha blending:

srcAlpha = Compress_color( src.rgb * src.a )

Let’s see how does it look in practice. I created a sample texture with a “lighting” gradient in RGB channels and smoke puff in alpha (from left to right: RGB, alpha and premultiplied RGB by alpha):

srcTile

Now let’s alpha blend two compressed textures (DXT5), zoom and compare results (left image – standard alpha blending, right – premultiplied alpha blending):

cmp2

This looks like a small difference, but can be quite visible in a real game – especially when smoke particles are big or their color is multiplied by a factor greater than 1 or when using some kind of tone-mapping.

BTW there is interesting feature/bug in NVIDIA Photoshop texture tools. You can’t save DXT5 with full black alpha (it just creates a DDS without alpha channel).

Posted in Graphics | 1 Comment

SPU programming on a retail “fat” PS3 with Linux

Larabee is approaching, so it’s a good time to learn more about coding a modern multi-core software renderer. Thanks for Sony, everyone with a retail “fat” PS3 can install some Linux distro and have fun with SPU’s. Not as fun as with a real devkit – no Visual Studio with ProDG,  no access to RSX and no close to the metal SPU libs. You can’t even manually assign SPU tasks to physical SPU’s.

It took me some time to install and configure Linux (YDL 6.1). Small hint – if you have to use a window manager, use Fluxbox. It’s much faster on the retail PS3 than GNOME (slow), KDE(very slow) or Enlightenment(very slow). You can also work remotely using ssh/putty/Eclipse IDE (Linux only).
For people without the retail PS3 there is a cell simulator on the IBM site (again Linux only). Currently only the toolchain is ported to windows (windows cell sdk), so you can compile SPU stuff on Windows, but can’t run it there.
Be prepared for some posts about software rendering and low level SPU coding :).
Posted in CELL | Leave a comment