A couple of days ago someone pointed out to me a company named Penguin Computing. Penguin Computing is one of the leading companies in HPC, High Performance Computing, and it is a leading company which has dedicated itself to Linux for the full 100%. Penguin computing is, according to its mission statement a leader in Cluster Virtualization. One of the executives of Penguin Computing is Donald Becker who we know from creating beowulf clustering while he was working at NASA.
"Penguin Computing is the leader in Cluster Virtualization, the most practical and cost-effective methodology for reducing the complexity and administrative burden of clustered computing. Our cluster solutions are driven by Scyld ClusterWare, whose unique architecture makes large pools of Linux servers appear and act like a single, consistent, virtual system, with a single point of command/control. By combining the economics of Open Source with the simplicity and manageability of Cluster Virtualization, we help you drive productivity up and cost out, making Linux clustering as powerful and easy to use as expensive SMP environments, at a Linux price point."
One of the things that Penguin Computing is providing is a ready to use 'out of the box' cluster. You get it all, the software, the hardware and all you need to get your cluster up and running. Even do this is not as much fun as building your own cluster with a team of people it is more efficient I think. The thing holding it all together by the solutions from Penguin Computing is the Scyld ClusterWare Linux clustering software. The Scyld ClusterWare is a set of tools to manage your cluster, or as they would like to call it "Scyld ClusterWare HPC is an HPC cluster management solution".
The Scyld ClusterWare cluster is controlled by the master node which hosts the scheduler, process migration, parallel libraries and the cluster management tools. Both cluster administrators and the users connect to the master node. The scheduler on the master node is enforcing the cluster policies so that you can create rules that for example work from a specific department has more priority in the cluster than that from an other. So jobs with a higher priority are scheduled before work with a low priority. TORQUE is the main workload management tool used within Scyld ClusterWare as a basic scheduler, for your more advanced scheduling requests Scyld TaskMaster, Scyld TaskMaster is an adaptation of the Moab Suite from Cluster Resources.
provisioning your computing nodes is done via a network boot where the master image is loaded into the RAM memory of the computing nodes. No need for local disks running a operating system. Specific libraries and drivers needed by the computing nodes are provided by the master node on request. The computing nodes are running a very very lean operating system which is stripped from all unneeded options. Only services which are needed to communicate with the master node are provided which makes more room for the real work to be done. If we look at other cluster setups we have in most cases a 'stripped' linux operating system with a lot of services running which are not really needed but who are hard to remove from the system.
On the website of Penguin Computing they state that they can provision a new node with a computing node operating system within a minute, this makes my provisioning of Oracle Enterprise Linux look very very slow. However, they are provisioning a very small operating system into RAM where I was provisioning a complete distribution to disk so that makes some difference. By using a single image and the quick provisioning they can make sure that all nodes in the cluster are running the same OS which is a good thing and makes it even more stable.
One more thing from the website I would like to share with you:
"Scyld ClusterWare is fully compatible with RedHat Enterprise Linux, supporting a huge variety of applications from all HPC disciplines such as Mechanical Computer-Aided Engineering (MCAE), Life Sciences, Computational Fluid Dynamics, Financial Services, Energy Services and Electronic Design Automation (EDA). Application Notes for applications such as ANSYS®, FLUENT®, LS-Dyna®, Blast, Matlab® and Schrodinger® (Prime/Glide/Jaguar) are available for customers through Penguin Computing's Support Portal".
From what I read at the website Penguin Computing is doing a great thing and they have a great solution, however, I would love to play with the system for a couple of days to get to know more of how it all is working. You can download a fully working Scyld ClusterWare for a test period of 45 days. Great, however, I do not have enough spare computers to build a test cluster. If Penguin Computing is having a road show somewhere in Europe I might take a flight to talk to some of the people and have a look at the system when it is in operation.