Particles are usually some kind of simple structs stored in an array. Then every frame vertex buffer is filled using that array. You really don’t want to sort them (at least not all particles and sometimes sorted particles don’t look good). You need also to retain insertion order to prevent particle popping. Using simple array and replacing deleted element with the last one isn’t a good idea.
Straightforward solution is to use circular buffer. This has some issues – if particles have no uniform lifetime it needs some checks in order to detect dead particles when filling the vertex buffer. There will be also some holes in the circular buffer, so memory won’t be used effectively.
Another solution is to use array and defragment it every frame. This means no conditional statements, but could lead to a lot of memory copying (depending on the lifetime of particles).
The interesting thing is that above methods can be combined. Third method uses an assumption that in case of the defragmented array You usually remove particles from the beginning and add to the end. So why not just have large array and two floating pointers for marking begin and end of particle data. Now when You add something – increase end pointer. When You remove an element – increase begin pointer. In case of removal from the middle of data – defragment by shifting data from left or right. When end pointer can’t be increased just defragment by copying data to the beginning of the array. This doesn’t change the worst case – removal from the array middle, but should greatly speed up the average case.