Wednesday, September 12, 2018

Oracle Cloud - Elastic Search set number of replicas for shards

We see more and more that customers leverage Elastic Search within modern application deployments. Recently we have been experimenting with Elastic Search in the Oracle Cloud. The initial setup was a relative simple cluster setup of a number of nodes. When doing the first resilience test we found out that we where finding that we degraded the cluster state and lost data when turning off a virtual machine in the Oracle Compute Cloud. The reason for this was that we did not set the replication for shards in the correct manner. Ensuring you have set sharding correctly is vital to ensure your Elastic Search Cluster in the Oracle Cloud is resilient against node failure.

Building a cluster in the cloud
When building a Cluster in the Oracle Compute Cloud based upon Oracle Linux this is in effect nothing different from building an Elastic Cluster in any other cloud or in your own datacenter. This holds that the below is applicable to any installation you do.

The general idea when building an Elastic Search Cluster is that you ensure that you have multiple nodes with multiple roles working as one while ensuring that the cluster is capable of loosing a number of its member nodes and will still continue functioning. The main roles within the cluster configuration of Elastic Search are data nodes, master nodes and client nodes.

when configuring your cluster you have to ensure that you are capable of loosing one node and still be able to provide the services to your customers. One of the things we overlooked while building the initial cluster was the level of replica shards.

Use replica shards
When you store an index in Elastic Search this can be broken down into multiple shards. The shards can be distributed over multiple machines (nodes) in the cluster. The idea behind it is two folded. Firstly it helps you to distribute load and secondly it helps you to ensure data is on more than one node at the same time to ensure that the data is available when a node is removed from the cluster.

To ensure the optimal use for both distributing operations as well as ensuring you have a replica of a shard to mitigate against failure you will have to ensure you have a replicated shard. The below image shows this on a high level:

As you can see in the above example we have two shards for a single index, P0 and P1. However, we also have a replica shard for both R0 and R1. In case node 1 is failing for some reason the data is still available for the cluster in the form of shard R0. In this example we have a replication of 1, you can however set the replication much higher. The level of replication will depend on the level of risk you want to take combined with the level of compute distribution you want to achieve against the costs you are willing to have.

As storage is relative cheap in the Oracle Compute Cloud it is advisable to set the replication higher than 1 to ensure you have 2 or more replication shards.

Setting the replication level
You can set the replication level per index in Elastic Search. This will help you to ensure you have more shards for data which is accessed frequently or where you need a higher level of availability. Setting the replication level is done with the parameter number_of_replicas as shown below:

PUT /my_index/_settings
  "number_of_replicas": 1

using shard replication will help you protect against failure of a datanode in your elastic search cluster and it will improve the performance. As storage is relative cheap in the Oracle Cloud it is advisable to ensure you have set number_of_replicas to 2 or higher.