It seems that 2008 will be the year of performance for Equalizer. 🙂
I’ve just commited the first version of CPU-based depth assembly to the Equalizer code base. The algorithm first assembles all images into a memory buffer on the CPU, and then transfers the result to the GPU using the default algorithm (GLSL in this case).
As you can see on the right, this algorithm is vastly faster than using the GPU. Each input image in this configuration adds about 4 ms to the total time, as compared to 18 ms for GLSL.
Since there is the static cost for setup and transferring the result (about 22 ms in the benchmark), the CPU-based algorithm makes only sense for more than one image, but becomes substantially faster as the number of input images grows. With four images, the new code is 4.5x faster than Equalizer 0.4, with eight images 6.4 times.
The benchmark used the same machine as in the previous posting. It has two 2.2 GHz Opteron CPU’s, but since OpenMP was not enabled, only one processor was used. OpenMP should accelerate this code path even more.