Archive for May, 2008

Release early, Release often

16. May 2008

In this spirit, we’ve just started doing more regular developer releases. Version 0.5.1 is the first one, with the following new features:

– Statistics Overlay to understand and eliminate bottlenecks in the rendering pipeline
– Support for using Paracomp as a compositing backend, see README.paracomp
– Network-based instead of file-based model distribution in eqPly
– Support for the window swapsync hint on WGL

Full release notes are available at:
www.equalizergraphics.com/documents/RelNotes/RelNotes_0.5.1.html

Enjoy!

Parallel Rendering Timeline

16. May 2008

Parallel Rendering Timeline
On the right there’s a simple timeline of the most important toolkits for parallel rendering, naturally with more details for Equalizer.
I plan to extend this over time, and maybe even creating one with the major hardware milestones.
Any input is welcome – Skywriter VGXT, anyone? 😉

ICC, GCC and OpenMP

15. May 2008

Since a colleague finished the CPU-based alpha-compositing in Equalizer, it was time for another compiler benchmark round.

Performance of gcc, icc and OpenMP
This time I used my MacBook Pro with an Intel Core 2 Duo 2.16 GHz, running Mac OS X 10.5.2. The compilers available were gcc 4.0.1, gcc 4.2.1 and icc 10.1.014. The latter two ones I tested with OpenMP disabled and enabled.
The results can be seen on the left (click on the picture for a large version). The upper graph shows the absolute throughput in MB/s for the performance-critical algorithms in Equalizer, and the lower the relative performance compared to the gcc 4.0.1 baseline.

Depth compositing assembles multiple color input images into an destination image based on the depth values. This is used for recombining the result of database decompositions of polygonal data.
Alpha compositing blends the results of volume rendering based on the alpha-value of the images.
Image compression is a RLE-like algorithm used to compress the images during network transfer.

For all tests only the basic optimization flag ‘-O2’ was used. I am sure that by tweaking the compiler flags and code, more performance can be squeezed out of the algorithm.
Nevertheless the results are interesting and representative, since I don’t have the time to investigate and maintain more complicated optimizations.
I think most programmers are under similar time constraints, and getting a 50-100% speed bump by just changing the compiler, and another couple of percents for adding a simple OpenMP pragma is quite valuable.

Good work Intel and the GCC-OpenMP team!

PS: Anybody has seen this bug with gcc and OpenMP?