Parallel Rendering

IEEE VisWeek: Equalizer poster

16. October 2012

Equalizer Poster

Tomorrow night is the poster session for the second poster, recent advances in Equalizer: Region of Interest, Focus Distance, Optimizations for Multi-GPU Clusters (Thread Affinity, Asynchronous Readback) and new applications.

The poster is already up in Ballroom A, or on the right side.

The DASH poster reception went very well, everybody I talked was convinced by the concept and saw immediate applicability for their problems. The feedback was much more positive than I was hoping for.

Tags: technology
Posted in Display Wall, Equalizer, Graphics Cluster, Linux, Multi-GPU, OpenGL, Parallel Rendering, Parallel Rendering Frameworks, Virtual Reality | 1 Comment »

IEEE VisWeek: DASH Poster

14. October 2012

DASH Poster

Today I’ll be presenting our DASH poster at the IEEE VisWeek. This is the first official outing of DASH, I’m intrigued to see what interest it will generate. If you’re around in Seattle, feel free to come and speak to me, otherwise have a look at the poster on the right and leave comments.

For more information and the source code go to github.com/BlueBrain/dash

Thanks to Marwan for designing this poster!

Tags: technology
Posted in multicore | Leave a Comment »

Equalizer 1.4 released

7. September 2012

The last two weeks have been quiet, since I was on biking through Switzerland. Meanwhile, poor Daniel back at work churned through most of the Collage changes outlined in the last post. You can see the changes in the Collage endian branch on github, which will be merged back into master in the next couple of weeks.

Now back to the news: After finally figuring out how to build Equalizer and dependencies using MacPorts portfiles on Mac OS X, I released the long-standing 1.4 version of Equalizer, GPU-SD, Lunchbox and vmmlib. Below is the release announcement – enjoy!

Neuchatel, Switzerland – September 7, 2012 – Eyescale is pleased to announce the release of Equalizer 1.4.

Equalizer is the standard framework to create and deploy parallel, scalable 3D applications. This modular release includes Collage 0.6, a cross-platform C++ library for building heterogenous, distributed applications, GPU-SD 1.4, a C++ library and daemon for the discovery and announcement of graphics processing units using zeroconf networking and Lunchbox 1.4, a C++ library for multi-threaded programming. All software packages are available for free for commercial and non-commercial use under the LGPL open source license.

Equalizer 1.4 is a feature release extending the 1.0 API, introducing major new features, most notably asynchronous readbacks, region of interest and thread affinity for increased performance during scalable rendering. It culminates over seven years of development and decades of experience into a feature-rich, high-performance and mature parallel rendering framework and related high-performance C++ libraries.

Equalizer enables software developers to easily build interactive and scalable visualization applications, which optimally combine multiple graphics cards, processors and computers to scale the rendering performance, visual quality and display size.

Equalizer Applications

Eyescale provides software consulting and development services for parallel 3D visualization software and GPU computing applications, based on the Eyescale software products or other open and closed source solutions.

Please check the release notes on the Equalizer website for a comprehensive list of new features, enhancements, optimizations and bug fixes. A paperback book of the Programming and User Guide is available.

We would like to thank all individuals and parties who have contributed to the development of Equalizer 1.4.

Left image courtesy of Cajal Blue Brain/ / Blue Brain Project. Second from left copyright Realtime Technology AG, 2008. Right image courtesy University of Siegen, 2008.

Tags: 3d visualization software, software, software-development
Posted in Collage, Darwin, Display Wall, Equalizer, Graphics Cluster, Linux, Lunchbox, Multi-GPU, multicore, OpenGL, OpenMP, OS X, Parallel Compositing, Parallel Rendering, Parallel Rendering Frameworks, Windows | Leave a Comment »

Sneak Peek: The new Collage

17. August 2012

We’ve started working on making Collage endian safe to be able to communicate between an IBM BlueGene and X86 workstations. This requires that all data exchanges can be byte-swapped, which necessitates heavy refactoring. Long story short, this stalled the 1.0 API definition since the API for Node/LocalNode is not yet final. Therefore I’ll throw in a preview on how the new Collage peer-to-peer communications will look like.

Every communication will be stream-based (see last week’s post). The receiver knows the endianness of the sender and will swap, if necessary, the byte order in the corresponding DataIStream. This approach has the benefit that byte swapping only occurs in mixed-endian environment, whereas the traditional network big endian order nowadays requires two swaps in an x86-only cluster. The packets exchanged between nodes today will disappear completely, since Collage can’t examine their structure.

You can observe this work in the corresponding issue ticket. Sending a packet in the old way looks like this:

NodeAttachObjectPacket packet;
packet.requestID = _localNode->registerRequest( object );
packet.objectID = id;
packet.objectInstanceID = instanceID;
_localNode->send( packet );

This is replaced by a DataOStream, which will send the data once it goes out of scope, making the code much nicer:

const uint32_t requestID = _localNode->registerRequest( object );
_localNode->send( CMD_NODE_ATTACH_OBJECT ) << id << instanceID << requestID;

The receiving side doesn’t improve as much, since we still need to extract all data into local variables. The old code accesses the raw dat by casting it to the appropriate packet:

const NodeAttachObjectPacket* packet = command.get();
[...]
_attachObject( object, packet->objectID, packet->objectInstanceID );

The new code extracts the data into local variables, which will be endian-converted if necessary:

const UUID& objectID = stream< UUID >.get();
const uint32_t instanceID = stream< uint32_t >.get();
[...]
_attachObject( object, objectID, instanceID );

The good news is that Equalizer application code is not affected at all by this. Besides the byte swapping, this will enable other features, since all data passes through the DataI/OStream, for example automatically compressing any packet over a certain size.

This work will continue the following weeks, and once it’s merged into the master branch I’ll continue with my introduction to the then new-and-shiny Collage.

Posted in Collage, Intel, Linux, Windows | Leave a Comment »

Introducing Collage: DataI/OStream

10. August 2012

The co::DataOStream and co::DataIStream form the core of the co::Object data distribution. They will gain even more importance in the next couple of weeks, when they will replace the current packet-based messaging (see 145). They will become the core of any communication between Collage nodes.

The data iostreams provide a std::iostream-like interface to send data over the network. They hide all the network connection details, allow overlapping of data serialization and sending through bucketization, do configurable compression and allow application-provided serializers for custom data types. We’re currently working on extending them to also do automatic endian conversion.

First of all they allow object serialization without a need of the application to know if the data has to be saved for later used (buffered objects), who will receive the data, whether or not to use multicast or how to compress it. The serialization in the application code is as simple as possible, here for example the eqPly::FrameData:

void FrameData::serialize( co::DataOStream& os, const uint64_t dirtyBits )       
{                                                                                
    co::Serializable::serialize( os, dirtyBits );                                
    if( dirtyBits & DIRTY_CAMERA )                                               
        os << _position << _rotation << _modelRotation;                          
    if( dirtyBits & DIRTY_FLAGS )                                                
        os << _modelID << _renderMode << _colorMode << _quality << _ortho        
           << _statistics << _help << _wireframe << _pilotMode << _idle          
           << _compression;                                                      
    if( dirtyBits & DIRTY_VIEW )                                                 
        os << _currentViewID;                                                    
    if( dirtyBits & DIRTY_MESSAGE )                                              
        os << _message;                                                          
}

The deserialize method looks exactly the same, except using a DataIStream and the >> operator instead of <<. Applications can write free-standing serialization functions, similar to free-standing std::iostream operators.

Behind the scene Collage sets up the ostream with all connections to the nodes which have slave instance of the FrameData, preferring multicast connections. The output is bucketized, that is, whenever the accumulated data reaches a certain threshold (default ~64k), the current block is sent. This allows the OS to send data while the application prepares the next buffer.

Each DataOStream has a configurable compressor. The default algorithm uses heuristics to choose the best tradeoff between speed and compression ratio. Each outgoing packet is compressed before transmission. For buffered objects, the compressed data is retained to optimize memory usage.

Currently we are working on automatic endian conversion to form Collage networks between little and big endian hosts (see 146). Since most of the data types are known at deserialization time, a templated swap function will provide the endianness conversion. For void data we simply have to assume that it is already endian safe, or that the application doesn’t need endian safeness. In Collage and Equalizer we will make sure everything is endian-safe, e.g., by using portable boost serialization archives for the co::DataOStreamArchive.

Tags: connection details, data distribution, object serialization, serialize
Posted in Benchmarks, Collage, Darwin, Linux | Leave a Comment »

Introducing Collage: github

3. August 2012

Collage Dependencies

Good news, everybody: Since this week, Collage is a separate project on github.

First this means that it’s much more lightweight to use since it has a small source code and repository size (<10MB) and less dependencies compared to the full Equalizer project. Say hello fast compilation times (less than a minute on my slowish laptop), easy setup and simpler directory layout.

Second this means that the next version will have a well-defined, stable API, similar to what Equalizer already has. I’m steadily working toward this with good progress and lots of cleanup. This means that building and maintaining distributed applications based on Collage will be painless.

Third and last my hope is that this gives the project more visibility and credibility, and therefore more outside contributions. We’ve already had quite a few awesome ones in the past, such as InfiniBand RDMA and UDT support.

Edit: API documentation can be found on eyescale.github.com.

Tags: rdma, software-development, udt
Posted in Collage, Darwin, Equalizer | Leave a Comment »

Introducing Collage: Barrier

15. July 2012

While I personally think that barriers are an anti-pattern, they have exactly one valid use case in my line of work — as swap barriers synchronizing the display of a new frame across multiple segments of a display wall or immersive installation.

In an ideal work, swap synchronization would be done using hardware support. Equalizer supports this for nVidia G-Sync, but I haven’t seen many installations using hardware swap synchronization. First, it’s expensive since you need a professional grade card with a special synchronization board. So lower cost installations such as display walls typically don’t even have the hardware. Installations which need the frame synchronization, such as active stereo setups, oftentimes only use the frame (retrace) synchronization and use a software barrier for swap synchronization. The reason is that getting hardware swap synchronization running is such a mess due to driver issues that most people don’t bother. For these reasons, Equalizer both supports hardware and software swap synchronization. The software synchronization used a co::Barrier.

Back to the Collage barrier: Once it is set up, the only call needed is enter, which will block until the height has been reached.

Barriers are versioned, distributed objects. Any process can set up a barrier, register it and communicate the barrier identifier and version to all users of the barrier. The users map, sync and enter the barrier. Since it’s versioned, the master instance can be committed any time, and enter requests are versioned, that is, a barrier operation for an old version will be finished before the enter requests for the new version are processed.

The protocol right now is very simple, a master instance simply tracks enter requests until the height is reached and then unlocks all users. While there are other algorithms which use optimized tree structures or broadcast to reduce the latency, we haven’t seen the need to implement a more complex algorithm.

Posted in Collage, Display Wall, Equalizer, Graphics Cluster, Linux, Multi-GPU, Parallel Rendering, Parallel Rendering Frameworks, Windows | Leave a Comment »

Introducing Collage

6. July 2012

Collage evolved from the Equalizer network library. Now it’s used by a few other projects as well, and I’ve been mentioning it quite a few times on this blog already.

Looking back over the last seven years of Equalizer, Collage has received by far the largest investment in manpower compared to all the other components. It seems innocent enough, but trust me, getting a distributed network library right and bug free is no small task. On the one hand, the ‘fun’ implementation of the Windows IP stack (WSASYSCALLFAILURE and stack corruptions) took a lot of debugging to get going, and on the other hand advanced features such as InfiniBand support (thanks Dardo!) and fast, reliable multicast are no small task to implement.

What’s wrong with boost::asio?

Nothing. It provides about the same functionality as co::Connection and co::ConnectionSet. We use it as the UDP backend for the RSP multicast implementation. It wasn’t around when I started eq::net, which became Collage. We’ld love to replace co::Connection and co::ConnectionSet with it, but the effort of porting the RDMA and RSP connections to asio has prevented this so far. Ultimately Collage tries to provide higher level abstractions.

What’s wrong with 0MQ?

Again: Nothing. It provides higher level abstractions then asio. It’s less likely that we’ll use it as the backend for Collage, since some of the design decisions are somewhat different from what Collage is doing.

So, what’s next?

Collage, similar to Lunchbox, aims to provide high-level abstractions. Similarly to the introducing Lunchbox series, I’ll present them over the next few weeks. In the same timeframe, we are planning to separate the Equalizer and Collage projects as well as to define and document the ‘1.0’ Collage API.

For now, you can find a technical overview presentation on the Collage website. Enjoy!

Posted in Collage, Equalizer, Linux, OS X, Windows | Leave a Comment »

Programming and User Guide for Equalizer 1.4

29. June 2012

Equalizer Programming and User Guide

I’ve just uploaded the review version of the Programming and User Guide for the upcoming 1.4 release of Equalizer, and to a certain extent, Collage.

This one packs 111 pages of content (118 total) and 63 figures. In a couple of weeks I’ll create the final hardcopy version on Amazon/CreateSpace.

What’s New?

This edition has, among the customary full review pass, a lot of new content. Starting with a full new chapter on Sequel and the associated polygonal rendering example, continuing with new section for application-specific scaling factors in immersive environments, region of interest for compositing and load-balancing and zeroconf discovery in Collage, finally finishing with a substantial rewrite of the section on distributed objects in Collage.

What?

The book is structured in two parts: The User Guide, laying the foundation on parallel rendering and scalability algorithms, and then explaining the configuration of visualization systems for Equalizer applications. The appendix contains a full reference on the file format.

The second part, the Programming Guide, gradually introduces programming parallel rendering applications. Starting with the basics in eqHello, the complexity is gradually increased with a chapter on Sequel and Equalizer using the respective example application. After this, an advanced features section focuses on introducing and demonstrating on specific features in isolation. It finishes of with a chapter on the Collage network library.

Why?

The Programming and User Guide is the ‘OpenGL red book’ of Equalizer. It consolidates all the documentation available in various places (Equalizer website, mailing list, github feature issues, my head) into a single document. Apart from gathering this information, through the format emerges a bigger picture, putting design decisions in context.

Posted in Collage, Darwin, Display Wall, Equalizer, Graphics Cluster, Linux, Lunchbox, Multi-GPU, OpenGL, OS X, Parallel Compositing, Parallel Rendering, Parallel Rendering Frameworks | Leave a Comment »

Introducing lunchbox::Servus

22. June 2012

Yes, we’re back to Lunchbox. During the 1.4 beta release (see previous post) we decided to merge a separate library in Lunchbox, since keeping it separate would have been overkill. In the end, it’s just a single class: Servus.

For the 999‰ of you who can’t get the reference: It’s a play on words to Bonjour. This class implements a simple C++ interface to announce and discover key-value pairs over zeroconf networking (aka. Bonjour).

The usage is really simple: You instantiate the class with the service name you’re interested in, e.g., “_collage._tcp”. Then you can register key-value pairs and start announcing them, and you can discover existing key-value pairs announced by other processes, typically within the subnet. It’s also legal to update key-value pairs on an announced service. Sounds simple enough, but if you’ve ever used the callback-driven dns_sd C API, you’ll know it’s much easier to use.

We use this class for two things right now: Collage node discovery and in GPU-SD.

In Collage, each co::LocalNode announces his node identifier and connection descriptions. This information is used as a fallback path when performing a LocalNode::connect( NodeID ) to identify and to connect to a previously unknown node. The Servus handle is exposed to applications through co::Zeroconf, which is a wrapper to ensure thread-safety. Applications can then use additional key-value pairs specific to their implementation.

In GPU-SD, Servus is used to announce and discover graphics cards in a visualization cluster, which in turn is used by Equalizer for auto-configuration.

Posted in Collage, Darwin, Linux, Lunchbox, Windows | Leave a Comment »