More on Depth-Based Assembly

Performance of various DB assemble algorithms

It seems that 2008 will be the year of performance for Equalizer. 🙂

I’ve just commited the first version of CPU-based depth assembly to the Equalizer code base. The algorithm first assembles all images into a memory buffer on the CPU, and then transfers the result to the GPU using the default algorithm (GLSL in this case).

As you can see on the right, this algorithm is vastly faster than using the GPU. Each input image in this configuration adds about 4 ms to the total time, as compared to 18 ms for GLSL.

Since there is the static cost for setup and transferring the result (about 22 ms in the benchmark), the CPU-based algorithm makes only sense for more than one image, but becomes substantially faster as the number of input images grows. With four images, the new code is 4.5x faster than Equalizer 0.4, with eight images 6.4 times.

The benchmark used the same machine as in the previous posting. It has two 2.2 GHz Opteron CPU’s, but since OpenMP was not enabled, only one processor was used. OpenMP should accelerate this code path even more.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: