As you might have heard, the Cray XMT implemented a multithreaded processor architecture (called Threadstorm); these processors are compatible with Socket F which means they can use the AMD Opteron CPUs. The interesting part however is that these Threadstorm CPU’s only execute user code and avoids memory dependency stalls i.e. when the memory dependence prediction goes wrong and stalls the specific load to ensure there is no violation.
The Cray XMT does this by switching among 128 concurrent threads. As the XMT supports more than 8000 CPUs, if one needs to maximize throughput the developer must provide at least 128 threads per CPU, With 8K CPUs you are looking at over 1,024,000 threads!
Needless to say, with such large number of threads, it is extremely important to get thread management implemented correctly – without that the system won’t be able to scale and even deadlock.Another factor is the application design specifically the parallel programming models (including the recursive threaded models) and resource management to be able to successfully handle resource exhaustion.
If this is an area of interest then you should check out the likes of OpenMP , Parallel Extension to .NET 4 (which include PLINQ and TPL), CCR , etc.
Underpinning all of this of course is Amdahl’s law which one should be comfortable with; including its relation to the law of diminishing returns .
I wonder, where I can I get some time on a Cray XMT? I can also settle for a Cray CX1 – anyone willing to donate some money to a poor geek to help with this?