ICC, GCC and OpenMP


Since a colleague finished the CPU-based alpha-compositing in Equalizer, it was time for another compiler benchmark round.

Performance of gcc, icc and OpenMP
This time I used my MacBook Pro with an Intel Core 2 Duo 2.16 GHz, running Mac OS X 10.5.2. The compilers available were gcc 4.0.1, gcc 4.2.1 and icc 10.1.014. The latter two ones I tested with OpenMP disabled and enabled.
The results can be seen on the left (click on the picture for a large version). The upper graph shows the absolute throughput in MB/s for the performance-critical algorithms in Equalizer, and the lower the relative performance compared to the gcc 4.0.1 baseline.

Depth compositing assembles multiple color input images into an destination image based on the depth values. This is used for recombining the result of database decompositions of polygonal data.
Alpha compositing blends the results of volume rendering based on the alpha-value of the images.
Image compression is a RLE-like algorithm used to compress the images during network transfer.

For all tests only the basic optimization flag ‘-O2’ was used. I am sure that by tweaking the compiler flags and code, more performance can be squeezed out of the algorithm.
Nevertheless the results are interesting and representative, since I don’t have the time to investigate and maintain more complicated optimizations.
I think most programmers are under similar time constraints, and getting a 50-100% speed bump by just changing the compiler, and another couple of percents for adding a simple OpenMP pragma is quite valuable.

Good work Intel and the GCC-OpenMP team!

PS: Anybody has seen this bug with gcc and OpenMP?

Advertisements

2 Responses to “ICC, GCC and OpenMP”

  1. Shree Kumar Says:

    Interesting results, Stefan. For GCC, I’ve seen -funroll-loops giving better results.

    How many threads/processors did you use with OpenMP ?

    A side observation from me : do you know if ICC has any flags to generate SSE instructions for alpha blending ? Alpha blending really must run faster than depth compositing (it does in paracomp on x86_64)

  2. eile Says:

    > How many threads/processors did you use with OpenMP ?

    The default, I assume two on my Core 2 Duo.

    > do you know if ICC has any flags to generate SSE instructions for alpha blending ? Alpha blending really must run faster than depth compositing (it does in paracomp on x86_64)

    I guess there are such flags. I haven’t looked into the code closely, and it’s really up to Max to take care of this part. I’ll tell him.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: