My machine has only 4 cpus. If I give manta -np 4 meaning run on 4 threads, it still creates 1 or 2 more threads for some more things that I do not know, even though I use -imagedisplay null. In my stats, I see almost linear scaling when I go from 1 to 3 threads, but when I see for 4 threads scaling drops down significantly. So on my machine, the best results will be for up to three threads.
If you run "bin/manta -np 4" (or bin/afr) the program will create five new threads in addition to the initial "main" thread. Four of these threads will render and one will run the user interface (XWindowUI by default). The initial thread suspends until all of the rendering threads terminate so that the process can exit cleanly, and we don't lose anything that happened to be allocated on its stack. If you specify additional -ui options on the command line (for example, -ui "camerapath(...)") a separate thread will be created for each separate user interface.
Setting image display to null doesn't change the number of threads because this task is performed by one or more of the rendering threads. By default in manta, OpenGLDisplay runs in one thread that may be specified by the -thread argument. When you use a null image display you are eliminating any load imbalance that might be caused by only one thread performing display functions. So for example if you are running mesa opengl on a machine without any pipes, it's possible that the a software DrawPixels coupled with glx activity over a slow network might take more time then actually ray tracing a frame--and become the bottle neck.
Usually for a benchmark you would specify "-ui null -imagedisplay null".
In the itanium2 branch: bin/dm_demo is a little different because the FOX user interface is run in the main thread so there isn't an extra ui thread. If you run a camera path using the gui dialog boxes a separate thread from the ui will be created to march along the path. Likewise if you ctrl+double left click to "zoom to point" an extra thread will be created to march along the camera path. bin/sc_demo starts an additional MFE command receiving/collaboration layer in it's own thread.
It's not too surprising that you can't use as many rendering threads as processors on your machine. The operating system still has to run some times, along with XWindows etc. Usually we spare a brick or so worth of CPU's when running on the larger systems and specify exactly which cpu each thread should run on. This eliminates a lot of system differences between runs. On an sgi you would use the runon or dplace command to do this, I'm not sure if these are available on other linux distributions.
--Abe 16:11, 6 Nov 2005 (MST)