
"Designing real distributed systems requires consideration of networking topology."
Let say you are designing a Hadoop cluster for distributed computing for a company who will be processing lots and lots of information during the night to be able to use this information the next morning for daily business. The last thing you want is that the work is not completed during the night due to a networking problem. A failing switch can, in certain network setups, make that you loose a large portion of your computing power. Thinking about the network setup and making a good network blueprint for your system is a vital part of creating a successful solution.
We take for example the previously mentioned company. This company is working on chip research and during the day people are making new designs and algorithms which needs to be tested in your Hadoop cluster during the night. The next morning when people come in they expect to have the results of the jobs they have placed in the Hadoop queue the previous day. The number of engineers in this company is enormous and the cluster has a use of around 98% during 'non working hours' and 80% during working hours. As you can see a failure of the entire system or a reduction of the computing power will have a enormous impact on the daily work of all the engineers.
A closer look at this theoretical cluster.
- The cluster consists out of 960 computing nodes.
- A node is a 1u server with 2 quad core processors and 6 gigabyte of ram.
- The nodes are placed in 19' racks every rack houses 24 computing nodes.
- There are 40 racks with computing nodes.
As you can see, if we would lose a entire rack due to a network failure we would lose 2.5% of the computing power. As the cluster is used 98% during the night we would not have enough computing power to do all the work during the night. Losing a single node will not be a problem, however losing a stack of nodes will result in a major problem during the next day. For this we will have to create a network blueprint in which we can ensure that we will not lose a entire computing stack. If we talk about a computing stack we talk about a rack of 24 servers in this example.
First we will have to take a look at how we will connect the racks, if you look at the Cisco website you will be able to find the Cisco Catalyst 3560 product page. The Cisco Catalyst 3560 has 4 Fiber ports and we will be using those switches to connect the computing stacks. However, to ensure the network redundancy we will use 2 switches for very stack instead of one. As you can see in the diagram below we crosslink the switches. SW-B0 and SW-B1 will both be handling computing stack B, switches SW-C0 and SW-C1 will be handling the network load for computing stack C etc etc. We will be connecting SW-B0 with fiber to SW-C0 and SW-C1. We will also connect SW-B1 with fiber to SW-C0 and SW-C1. In case SW-B0 or SW-B1 will fail the network can still route traffic to the the switches in the B computing stack and also the a computing stack. By creating the network in this way the network will not fail to route traffic to the other stacks, the only thing that will happen is that the surviving switch will have to handle more load.


To have the NIC's to switch to the surviving network switch and to make sure that operation continue as normal you will have to make sure that the network keeps looking at the servers in the same way as before the moment one of the switches failed. To do this you will have to make sure that the new NIC has the same IP address and MAC address. To do so you can make use of IPAT, IP Address Takeover.
"IP address takeover feature is available on many commercial clusters. This feature protects an installation against failures of the Network Interface Cards (NICs). In order to make this mechanism work, installations must have two NICs for each IP address assigned to a server. Both the NICs must be connected to the same physical network. One NIC is always active while the other is in a standby mode. The moment the system detects a problem with the main adapter, it immediately fails over to the standby NIC. Ongoing TCP/IP connections are not disturbed and as a result clients do not notice any downtime on the server. "

Even do this is a theoretical blueprint, developing your network in such a way in combination with writing your own code to controle network flows, scripts to control the IPAT and the switching back of IPAT, thinking about reporting and alerting mechanisms will make a very solid network. If there are any questions about this networking blueprint for cluster computing please do send me a e-mail or post a comment. I will reply with a answer (good or bad) or explain things in more detail in a new post.
No comments:
Post a Comment