Archive for August, 2012

Sneak Peek: The new Collage

17. August 2012

We’ve started working on making Collage endian safe to be able to communicate between an IBM BlueGene and X86 workstations. This requires that all data exchanges can be byte-swapped, which necessitates heavy refactoring. Long story short, this stalled the 1.0 API definition since the API for Node/LocalNode is not yet final. Therefore I’ll throw in a preview on how the new Collage peer-to-peer communications will look like.

Every communication will be stream-based (see last week’s post). The receiver knows the endianness of the sender and will swap, if necessary, the byte order in the corresponding DataIStream. This approach has the benefit that byte swapping only occurs in mixed-endian environment, whereas the traditional network big endian order nowadays requires two swaps in an x86-only cluster. The packets exchanged between nodes today will disappear completely, since Collage can’t examine their structure.

You can observe this work in the corresponding issue ticket. Sending a packet in the old way looks like this:

NodeAttachObjectPacket packet;
packet.requestID = _localNode->registerRequest( object );
packet.objectID = id;
packet.objectInstanceID = instanceID;
_localNode->send( packet );

This is replaced by a DataOStream, which will send the data once it goes out of scope, making the code much nicer:

const uint32_t requestID = _localNode->registerRequest( object );
_localNode->send( CMD_NODE_ATTACH_OBJECT ) << id << instanceID << requestID;

The receiving side doesn’t improve as much, since we still need to extract all data into local variables. The old code accesses the raw dat by casting it to the appropriate packet:

const NodeAttachObjectPacket* packet = command.get();
_attachObject( object, packet->objectID, packet->objectInstanceID );

The new code extracts the data into local variables, which will be endian-converted if necessary:

const UUID& objectID = stream< UUID >.get();
const uint32_t instanceID = stream< uint32_t >.get();
_attachObject( object, objectID, instanceID );

The good news is that Equalizer application code is not affected at all by this. Besides the byte swapping, this will enable other features, since all data passes through the DataI/OStream, for example automatically compressing any packet over a certain size.

This work will continue the following weeks, and once it’s merged into the master branch I’ll continue with my introduction to the then new-and-shiny Collage.


Introducing Collage: DataI/OStream

10. August 2012

The co::DataOStream and co::DataIStream form the core of the co::Object data distribution. They will gain even more importance in the next couple of weeks, when they will replace the current packet-based messaging (see 145). They will become the core of any communication between Collage nodes.

The data iostreams provide a std::iostream-like interface to send data over the network. They hide all the network connection details, allow overlapping of data serialization and sending through bucketization, do configurable compression and allow application-provided serializers for custom data types. We’re currently working on extending them to also do automatic endian conversion.

First of all they allow object serialization without a need of the application to know if the data has to be saved for later used (buffered objects), who will receive the data, whether or not to use multicast or how to compress it. The serialization in the application code is as simple as possible, here for example the eqPly::FrameData:

void FrameData::serialize( co::DataOStream& os, const uint64_t dirtyBits )       
    co::Serializable::serialize( os, dirtyBits );                                
    if( dirtyBits & DIRTY_CAMERA )                                               
        os << _position << _rotation << _modelRotation;                          
    if( dirtyBits & DIRTY_FLAGS )                                                
        os << _modelID << _renderMode << _colorMode << _quality << _ortho        
           << _statistics << _help << _wireframe << _pilotMode << _idle          
           << _compression;                                                      
    if( dirtyBits & DIRTY_VIEW )                                                 
        os << _currentViewID;                                                    
    if( dirtyBits & DIRTY_MESSAGE )                                              
        os << _message;                                                          

The deserialize method looks exactly the same, except using a DataIStream and the >> operator instead of <<. Applications can write free-standing serialization functions, similar to free-standing std::iostream operators.

Behind the scene Collage sets up the ostream with all connections to the nodes which have slave instance of the FrameData, preferring multicast connections. The output is bucketized, that is, whenever the accumulated data reaches a certain threshold (default ~64k), the current block is sent. This allows the OS to send data while the application prepares the next buffer.

Each DataOStream has a configurable compressor. The default algorithm uses heuristics to choose the best tradeoff between speed and compression ratio. Each outgoing packet is compressed before transmission. For buffered objects, the compressed data is retained to optimize memory usage.

Currently we are working on automatic endian conversion to form Collage networks between little and big endian hosts (see 146). Since most of the data types are known at deserialization time, a templated swap function will provide the endianness conversion. For void data we simply have to assume that it is already endian safe, or that the application doesn’t need endian safeness. In Collage and Equalizer we will make sure everything is endian-safe, e.g., by using portable boost serialization archives for the co::DataOStreamArchive.

Introducing Collage: github

3. August 2012


Collage Dependencies

Good news, everybody: Since this week, Collage is a separate project on github.

First this means that it’s much more lightweight to use since it has a small source code and repository size (<10MB) and less dependencies compared to the full Equalizer project. Say hello fast compilation times (less than a minute on my slowish laptop), easy setup and simpler directory layout.

Second this means that the next version will have a well-defined, stable API, similar to what Equalizer already has. I’m steadily working toward this with good progress and lots of cleanup. This means that building and maintaining distributed applications based on Collage will be painless.

Third and last my hope is that this gives the project more visibility and credibility, and therefore more outside contributions. We’ve already had quite a few awesome ones in the past, such as InfiniBand RDMA and UDT support.

Edit: API documentation can be found on