Archive for the ‘Parallel Rendering Frameworks’ Category

IEEE VisWeek: Equalizer poster

16. October 2012

Equalizer Poster

Tomorrow night is the poster session for the second poster, recent advances in Equalizer: Region of Interest, Focus Distance, Optimizations for Multi-GPU Clusters (Thread Affinity, Asynchronous Readback) and new applications.

The poster is already up in Ballroom A, or on the right side.

The DASH poster reception went very well, everybody I talked was convinced by the concept and saw immediate applicability for their problems. The feedback was much more positive than I was hoping for.


Equalizer 1.4 released

7. September 2012

The last two weeks have been quiet, since I was on biking through Switzerland. Meanwhile, poor Daniel back at work churned through most of the Collage changes outlined in the last post. You can see the changes in the Collage endian branch on github, which will be merged back into master in the next couple of weeks.

Now back to the news: After finally figuring out how to build Equalizer and dependencies using MacPorts portfiles on Mac OS X, I released the long-standing 1.4 version of Equalizer, GPU-SD, Lunchbox and vmmlib. Below is the release announcement – enjoy!

Neuchatel, Switzerland – September 7, 2012 – Eyescale is pleased to announce the release of Equalizer 1.4.

Equalizer is the standard framework to create and deploy parallel, scalable 3D applications. This modular release includes Collage 0.6, a cross-platform C++ library for building heterogenous, distributed applications, GPU-SD 1.4, a C++ library and daemon for the discovery and announcement of graphics processing units using zeroconf networking and Lunchbox 1.4, a C++ library for multi-threaded programming. All software packages are available for free for commercial and non-commercial use under the LGPL open source license.

Equalizer 1.4 is a feature release extending the 1.0 API, introducing major new features, most notably asynchronous readbacks, region of interest and thread affinity for increased performance during scalable rendering. It culminates over seven years of development and decades of experience into a feature-rich, high-performance and mature parallel rendering framework and related high-performance C++ libraries.

Equalizer enables software developers to easily build interactive and scalable visualization applications, which optimally combine multiple graphics cards, processors and computers to scale the rendering performance, visual quality and display size.

Equalizer Applications

Eyescale provides software consulting and development services for parallel 3D visualization software and GPU computing applications, based on the Eyescale software products or other open and closed source solutions.

Please check the release notes on the Equalizer website for a comprehensive list of new features, enhancements, optimizations and bug fixes. A paperback book of the Programming and User Guide is available.

We would like to thank all individuals and parties who have contributed to the development of Equalizer 1.4.

Left image courtesy of Cajal Blue Brain/ / Blue Brain Project. Second from left copyright Realtime Technology AG, 2008. Right image courtesy University of Siegen, 2008.

Introducing Collage: Barrier

15. July 2012

While I personally think that barriers are an anti-pattern, they have exactly one valid use case in my line of work — as swap barriers synchronizing the display of a new frame across multiple segments of a display wall or immersive installation.

In an ideal work, swap synchronization would be done using hardware support. Equalizer supports this for nVidia G-Sync, but I haven’t seen many installations using hardware swap synchronization. First, it’s expensive since you need a professional grade card with a special synchronization board. So lower cost installations such as display walls typically don’t even have the hardware. Installations which need the frame synchronization, such as active stereo setups, oftentimes only use the frame (retrace) synchronization and use a software barrier for swap synchronization. The reason is that getting hardware swap synchronization running is such a mess due to driver issues that most people don’t bother. For these reasons, Equalizer both supports hardware and software swap synchronization. The software synchronization used a co::Barrier.

Back to the Collage barrier: Once it is set up, the only call needed is enter, which will block until the height has been reached.

Barriers are versioned, distributed objects. Any process can set up a barrier, register it and communicate the barrier identifier and version to all users of the barrier. The users map, sync and enter the barrier. Since it’s versioned, the master instance can be committed any time, and enter requests are versioned, that is, a barrier operation for an old version will be finished before the enter requests for the new version are processed.

The protocol right now is very simple, a master instance simply tracks enter requests until the height is reached and then unlocks all users. While there are other algorithms which use optimized tree structures or broadcast to reduce the latency, we haven’t seen the need to implement a more complex algorithm.

Programming and User Guide for Equalizer 1.4

29. June 2012

Equalizer Programming and User Guide

Equalizer Programming and User Guide

I’ve just uploaded the review version of the Programming and User Guide for the upcoming 1.4 release of Equalizer, and to a certain extent, Collage.

This one packs 111 pages of content (118 total) and 63 figures. In a couple of weeks I’ll create the final hardcopy version on Amazon/CreateSpace.

What’s New?

This edition has, among the customary full review pass, a lot of new content. Starting with a full new chapter on Sequel and the associated polygonal rendering example, continuing with new section for application-specific scaling factors in immersive environments, region of interest for compositing and load-balancing and zeroconf discovery in Collage, finally finishing with a substantial rewrite of the section on distributed objects in Collage.


The book is structured in two parts: The User Guide, laying the foundation on parallel rendering and scalability algorithms, and then explaining the configuration of visualization systems for Equalizer applications. The appendix contains a full reference on the file format.

The second part, the Programming Guide, gradually introduces programming parallel rendering applications. Starting with the basics in eqHello, the complexity is gradually increased with a chapter on Sequel and Equalizer using the respective example application. After this, an advanced features section focuses on introducing and demonstrating on specific features in isolation. It finishes of with a chapter on the Collage network library.


The Programming and User Guide is the ‘OpenGL red book’ of Equalizer. It consolidates all the documentation available in various places (Equalizer website, mailing list, github feature issues, my head) into a single document. Apart from gathering this information, through the format emerges a bigger picture, putting design decisions in context.

1.4 beta release of the Eyescale open source packages

20. June 2012

We are pleased to announce the 1.4 beta release of the Eyescale open source packages. This release is a preview for testing the upcoming 1.4 stable release. It is the first modular release, and contains the following libraries and new features:

  • Equalizer: parallel rendering framework
    • Various scalable rendering performance features: asynchronous readbacks, region of interest and thread affinity.
  • Collage: C++ library for building heterogenous, distributed applications
    • Zeroconf support and node discovery
    • Blocking object commits
    • Increased InfiniBand RDMA performance
  • GPU-SD: discovery and announcement of GPUs using zeroconf
    • VirtualGL detection
    • Hostname command line parameter for gpu_sd daemon
  • Lunchbox: C++ library for multi-threaded programming
    • Servus, C++ interface to announce, discover and iterate over key-value pairs stored in a zeroconf service description
    • LFVector, a thread-safe, lock-free vector
  • Buildyard: A CMake-based superbuilder to download, configure and build the packages and dependencies for this release
    • Generates Unix Makefiles and solution files for Visual Studio 2008/10
    • Simple CMake project configuration scripts
    • Support for local overrides and user forks
    • Extensible with custom in-house or open source projects
  • A website for API documentation of all
    the aforementioned packages

Please test this release extensively and report any bugs on the respective project page at The release notes are part of the API documentation at

We would like to thank all contributors who made this release possible.

EGPGV 2012

15. May 2012

If you were wondering what’s up with last week’s ‘Introducing Lunchbox’ post: There wasn’t one since I’ve been to the Eurographics Symposium on┬áParallel Graphics and Visualization to present our paper “Parallel Rendering on Hybrid Multi-GPU Clusters”. This week I’m attending Eurographics, but I’ll try to post the fourth article in the Lunchbox series by Friday.

Our paper presented a collection and evaluation of optimizations for medium-sized GPU clusters which use Multi-GPU NUMA nodes. This type of architecture is quite important, since it provides a cost-effective configuration for parallel rendering, since the host and network infrastructure cost is amortized over multiple GPUs. During this paper we found a few surprising insights (<cough>glFinish</cough>) on what optimizations are actually important.

Enough talk: The most important parts are summarized in our presention. Enjoy!

Region of interest during scalable rendering

6. January 2012

ROI optimizes the compositing by limiting the readback region to the area which has been updated, thus reducing the numbers of pixels to be transferred, compressed and send over a network. I finally got around to implement ROI compositing in eqPly, the Equalizer example application.

ROI during streaming sort-last compositing

How does it work?

First, each resource tracks the region it updated during its draw operation by projecting each bounding box of each rendered chunk of geometry into screen space. All screen-space bounding boxes are merged to calculate the updated region.

Secondly, during a readback this region is intersected with the requested readback area. This intersected area is then read back.

Third, and this is the important part for parallel compositing, during assembly the resource region is updated to the union of the existing (draw) region and the regions of all input frames.

The corresponding commit is 448af149cd, in case you want to implement something similar. Eventually we will move this code to Equalizer, where we will also use the ROI information for load_equalizer optimization!

On the right you can see a screenshot of a four-way sort-last decomposition with streaming compositing. The ROI is rendered to demonstrate the feature. As a reminder, below is a video on streaming compositing for database decomposition:

C++ Library Symbol Exports – The Good, the Bad and the Ugly

28. June 2011

The Good

Selectively exporting symbols for a library is a good thing. It keeps program startup times down and enforces the public API of a library. The Good is that it’s possible and even the default mode for Visual C++.

The Bad

For whatever reason, the Visual C++ (or DLL?) designers decided that you have to declare your public functions with __declspec( dllexport ) when building the library and __declspec( dllimport ) when using it. What’s wrong with gcc’s visibility(“default”)?

All kind of fun ensues when you want to export explicit template instantiations with MSVC’s dualiton construct. Even more fun appears when you use static libraries.

The MSVC approach does not really work for static libraries. As far as I can tell, there are no exports for static libraries. Say you have a piece of code which you’ld like to share between different shared libraries without actually shipping this as a shared library as well. The fabric foundation in Equalizer is such an example, and right now this is ‘shared’ by compiling the relevant object three times – once for the client, server and admin library. Obviously this is not optimal since it increases the build time unnecessarily. Building fabric as a static library and linking it to these three shared libraries works fine with the GNU toolchain, but not with MSVC since it would require to manually specify all the public symbols when linking the DLL.

The Ugly

The implementation chosen by gcc is utterly useless for C++. Setting the default visibility to hidden requires to manually export the vtable of each class and all STL intantiations used – including the internal classes instantiated by the STL itself! While I can see how this implementation came into being, it is clearly designed for C code and not for C++. The vtable and STL are internals of the C++ implementation, developers should not need to care about them.

For now I have given up on using selective visibility with gcc, and only use explicit exports on Windows for Equalizer and other projects I’m working on. This makes sure that checkins will regularly break the Windows build. Hurray!

My Wishlist

One can dream:

For the VC++ developers: Please provide a __declspec(dllvisible) which works also for re-exporting symbols from static libraries. Let the toolchain figure out the details.

For the GCC developers: Please make exporting symbols from C++ classes simpler. VC++ can do it, so it can’t be that hard!


The above is obviously a rant. I am fully aware that workarounds exist for all of the issues mentioned above. Implementing them in real build environments is more time consuming than it should be and than is feasible. If you know about simple fixes, please comment below.

Equalizer Source Code Analysis

29. May 2011

Sub-Project Sizes

I was interested on the relative sub-project size in Equalizer, so I’ve quickly hacked together a small perl script extracting this data from the CMake output. Note that this is not an exact measure of the project complexity since (I believe) it only counts the number of files in each project, and not the lines of code or compile time.

Collage, the network library, is quite a big part and the one which will probably grow in relative size in the future. The Equalizer client library is relatively big, mostly since it contains a lot of code for window-system coupling. The server library is relatively small, considering that it contains all the rendering algorithms. On the examples side, eqPly is unsurprisingly the biggest one as it contains the most features.

Equalizer 1.0 released

20. May 2011

It’s been only 21 months since the last post, but both Equalizer and me are still alive. This month we finally released the version 1.0, which was looong overdue.

Most notably, this release defines the stable API for all Equalizer 1.x releases. This means that all the functions marked with version 1.0 will be source-code compatible until we’ll release an Equalizer 2.0. Parts of the API are still undefined and unstable, in particular for the also-new Collage network library. However, 99% all of the functions used by the examples are stable.

Since the last major version, 0.9, there have been plenty of improvements and new features, e.g., subpixel compounds, reliable multicast for data distribution, runtime mono/stereo switch and many more. A comprehensive list is in the Release Notes.

Since this month I’ve started working on a new project based on Equalizer, and hopefully I’ll update this blog more regularly. More about this in another post…