Distributed Manta

From Manta Wiki
Jump to: navigation, search

Distributed Manta

Manta has preliminary support for running on a cluster using MPI. It is based on the code used in the following paper for handling replicated data:

T. Ize, C. Brownlee, C.D. Hansen. Real-Time Ray Tracer for Visualizing Massive Models on a Cluster, In Proceedings of the 2011 Eurographics Symposium on Parallel Graphics and Visualization, 2011. (pdf)

As in the paper, on an InfiniBand based cluster we can scale up to about 60fps at 4M pixel resolution.

However, unlike the paper, support for rendering massive out-of-core datasets by splitting data across the nodes is not yet supported. Instead, data is always replicated across every node, so each node must contain enough memory to hold the entire scene and acceleration structures in memory. On the other hand, this means that all acceleration structures and primitive types can be used.

Build Instructions

To use, set CXX and CC environment variables to mpicxx and mpicc and then run cmake (you might need to do this from an empty build directory so that cmake chooses these as the compilers) and then in cmake set ENABLE_MPI to ON. I find that mvapich2 1.6 performs best on our infiband cluster.

Running Distributed Manta

After compiling, you can run it through mpi (using mpirun/mpirun_rsh, etc...) without needing to specify any special arguments to manta. Here we render the default scene on 64 nodes (of which 64-2=62 are render nodes) where each node is using 16 threads

mpirun_rsh -np 64 -hostfile myNodes MV2_USE_LAZY_MEM_UNREGISTER=0 MV2_NUM_RDMA_BUFFER=256 MV2_USE_SRQ=0 bin/manta -res 1024x1024 -np 16 -pixelsampler "\"jittersample(-numberOfSamples 8)\""

Note that the mvapich2 settings MV2_USE_LAZY_MEM_UNREGISTER=0 MV2_NUM_RDMA_BUFFER=256 MV2_USE_SRQ=0 make a very significant improvement in render time when using mvapich2 1.6! These might not work for mvapich2 1.7, and other mpi implementations might require different tuning parameters. Often times scalability is dramatically affected by the type of MPI implementation and the tuning parameters.

In the hostfile myNodes in our example, you can usually place the first two processes on the same node since these are the master processes (non-render nodes) and don't need as many cores. The exception is if you have less than 4 physical cores (hyper threading doesn't count) in which case you might get better results with all processes on their own nodes. If unsure, you can always try it out both ways. Also, you need at least 3 MPI processes for this to work -- 2 as the master nodes (both of those can exist on the same node) and 1 as a render node. This also means that when you are benchmarking, if you do mpirun_rsh -np 3 ..., this means that there is only 1 render node and so this should be (ideally) equal to the standard non-distributed manta. -np 4 should be twice as fast as standard manta, and so on...

Setting up MPI

You will need an MPI 2 implementation that supports MPI_THREAD_MULTIPLE. Oftentimes this requires a special argument or environment variable to be set or the MPI implementation needs to be compiled with multi-threading support enabled.

Currently (summer 2011), MVAPICH2 supports both InfiniBand and MPI_THREAD_MULTIPLE and is the recommended MPI for InfiniBand based clusters. Non-InfiniBand based clusters might perform better with other MPI implementations.

When building MVAPICH2, you will need to enable threads and sharedlibs. The configuration string I used to compile MVAPICH2 1.6 is:

--with-device=ch3:nemesis:ib --enable-sharedlibs=gcc --enable-threads --enable-languages=c,c++ --enable-error-checking=no --enable-fast=all

Later versions of MVAPICH might require slightly different arguments. For instance, the following might be required for MVAPICH2 1.7:

--enable-sharedlibs=gcc --enable-threads --enable-error-checking=no --enable-fast=all --enable-shared --enable-cxx

Limitations

Passing of certain information has not yet been coded in, for instance, resizing a window or pausing an animation will not be propagated to the render nodes. This code can of course be added in if someone wants it. Right now it just propagates basic camera state (so moving a camera is allowed) and a quit signal.