Region of interest during scalable rendering

6. January 2012

ROI optimizes the compositing by limiting the readback region to the area which has been updated, thus reducing the numbers of pixels to be transferred, compressed and send over a network. I finally got around to implement ROI compositing in eqPly, the Equalizer example application.

ROI during streaming sort-last compositing

How does it work?

First, each resource tracks the region it updated during its draw operation by projecting each bounding box of each rendered chunk of geometry into screen space. All screen-space bounding boxes are merged to calculate the updated region.

Secondly, during a readback this region is intersected with the requested readback area. This intersected area is then read back.

Third, and this is the important part for parallel compositing, during assembly the resource region is updated to the union of the existing (draw) region and the regions of all input frames.

The corresponding commit is 448af149cd, in case you want to implement something similar. Eventually we will move this code to Equalizer, where we will also use the ROI information for load_equalizer optimization!

On the right you can see a screenshot of a four-way sort-last decomposition with streaming compositing. The ROI is rendered to demonstrate the feature. As a reminder, below is a video on streaming compositing for database decomposition:

Day Job

23. September 2011

Back in May I was alluding to a new project I’ve started working on, but never got to write up a post with more information. Since this is unlikely going to happen, I let others do the job:

Year 1:


Year 2, my year 0:

Enjoy!

Lock Performance

5. July 2011

I’m currently working on a low-level library where locked data access has to be optimized. Therefore I benchmarked the performance of the three lock types in Collage on Linux and Mac OS X. The test just runs a number of threads which just set and unset the lock without any other operation. Click on the image below to get a full-resolution image. Be aware the chart uses double-log scale.

The two benchmarks can not be directly compared since they did not run on the same hardware. There are nevertheless a few interesting observations:

(1) Spinlocks are faster than ‘real’ locks. I’ve blogged about this before. Since they consume CPU time while spinning they should only be hold for a very short time, i.e., to read a value. The Collage implement immediately backs off when encountering a set lock by yielding the thread. This avoids priority inversion, which can be observed by some pthread spin lock implementations.

(2) pthread locks are dead slow on Mac OS X. Be aware that the graph uses log scale – a spin lock is up to three orders of magnitude faster than a pthread lock!

(3) Timed locks are slower than un-timed. This meets my intuitive expectation, since the timed implementation is more complex. The timed lock in Collage is implemented using pthread_cond_timedwait.

(4) The Spinlock is faster on OS X on slower hardware than on Linux. Not sure why that is the case. The Collage spin lock uses an atomic variable and compare_and_set. Either these operations are faster on the Core i5, or the thread yield behaves ‘better’ on OS X.

(5) Single-threaded lock access in pthread libraries seems to be optimized.

(6) pthread conditions on Linux observe a steep performance drop once you have more threads than cores. Could be a scheduling issue again.

Next I’ll work on benchmarking and optimizing read/write locking in the Collage Spinlock. Stay tuned for updates!

EDIT: I discovered a bug in my micro-benchmark which wrongly multiplied the results with the number of threads – doh! The figure is fixed now with a new test run.

C++ Library Symbol Exports – The Good, the Bad and the Ugly

28. June 2011

The Good

Selectively exporting symbols for a library is a good thing. It keeps program startup times down and enforces the public API of a library. The Good is that it’s possible and even the default mode for Visual C++.

The Bad

For whatever reason, the Visual C++ (or DLL?) designers decided that you have to declare your public functions with __declspec( dllexport ) when building the library and __declspec( dllimport ) when using it. What’s wrong with gcc’s visibility(“default”)?

All kind of fun ensues when you want to export explicit template instantiations with MSVC’s dualiton construct. Even more fun appears when you use static libraries.

The MSVC approach does not really work for static libraries. As far as I can tell, there are no exports for static libraries. Say you have a piece of code which you’ld like to share between different shared libraries without actually shipping this as a shared library as well. The fabric foundation in Equalizer is such an example, and right now this is ‘shared’ by compiling the relevant object three times – once for the client, server and admin library. Obviously this is not optimal since it increases the build time unnecessarily. Building fabric as a static library and linking it to these three shared libraries works fine with the GNU toolchain, but not with MSVC since it would require to manually specify all the public symbols when linking the DLL.

The Ugly

The implementation chosen by gcc is utterly useless for C++. Setting the default visibility to hidden requires to manually export the vtable of each class and all STL intantiations used – including the internal classes instantiated by the STL itself! While I can see how this implementation came into being, it is clearly designed for C code and not for C++. The vtable and STL are internals of the C++ implementation, developers should not need to care about them.

For now I have given up on using selective visibility with gcc, and only use explicit exports on Windows for Equalizer and other projects I’m working on. This makes sure that checkins will regularly break the Windows build. Hurray!

My Wishlist

One can dream:

For the VC++ developers: Please provide a __declspec(dllvisible) which works also for re-exporting symbols from static libraries. Let the toolchain figure out the details.

For the GCC developers: Please make exporting symbols from C++ classes simpler. VC++ can do it, so it can’t be that hard!

Discaimer

The above is obviously a rant. I am fully aware that workarounds exist for all of the issues mentioned above. Implementing them in real build environments is more time consuming than it should be and than is feasible. If you know about simple fixes, please comment below.

Equalizer Source Code Analysis

29. May 2011

Sub-Project Sizes


I was interested on the relative sub-project size in Equalizer, so I’ve quickly hacked together a small perl script extracting this data from the CMake output. Note that this is not an exact measure of the project complexity since (I believe) it only counts the number of files in each project, and not the lines of code or compile time.

Collage, the network library, is quite a big part and the one which will probably grow in relative size in the future. The Equalizer client library is relatively big, mostly since it contains a lot of code for window-system coupling. The server library is relatively small, considering that it contains all the rendering algorithms. On the examples side, eqPly is unsurprisingly the biggest one as it contains the most features.

Equalizer 1.0 released

20. May 2011

It’s been only 21 months since the last post, but both Equalizer and me are still alive. This month we finally released the version 1.0, which was looong overdue.

Most notably, this release defines the stable API for all Equalizer 1.x releases. This means that all the functions marked with version 1.0 will be source-code compatible until we’ll release an Equalizer 2.0. Parts of the API are still undefined and unstable, in particular for the also-new Collage network library. However, 99% all of the functions used by the examples are stable.

Since the last major version, 0.9, there have been plenty of improvements and new features, e.g., subpixel compounds, reliable multicast for data distribution, runtime mono/stereo switch and many more. A comprehensive list is in the Release Notes.

Since this month I’ve started working on a new project based on Equalizer, and hopefully I’ll update this blog more regularly. More about this in another post…

Equalizer 0.9 Released!

11. August 2009

Cross-Segment Load-Balancing
We are pleased to announce the release of Equalizer 0.9, the standard framework to create and deploy parallel, scalable OpenGL applications. The most notable new features in this release are:

Please check the release notes on the Equalizer website for a comprehensive list of new features, enhancements, optimizations and bug fixes. A paperback book of the Programming and User Guide is available from Lulu.com.

We would like to thank all individuals and parties who have contributed to the development of Equalizer 0.9.

Cross-Segment Load-Balancing

30. July 2009

The upcoming Equalizer release will have another advanced scalability feature: Load-balancing across all resources used for the multi-display system.

This video should explain it all, if not ask in the comments below:

I hope to have benchmarks of this feature soon.

Gallery: VR Lab University of Siegen

19. June 2009

Click on the image to see a gallery of the various Virtual Reality applications in use at the University of Siegen.

All applications are based on Equalizer, and most of them use head tracking and a flight stick for interaction. The architectural walk-through is using OpenSceneGraph.

Crazy Equalizer Configuration

7. June 2009

Below is a screenshot of an Equalizer configuration showing all basic decomposition modes in one window. The configuration file is in the source repository at examples/configs/1-window.mixed.eqc.

Top-left: Database
Top-right: DPlex
Bottom-left: 2D, load-balanced
Bottom-right (upper): Stereo
Bottom-right (lower): Pixel

All Equalizer Scalability Modes
Armadillo data set courtesy Stanford University Computer Graphics Laboratory.


Follow

Get every new post delivered to your Inbox.