Elasticsearch Configuration and Performance Tuning

Elasticsearch Configuration

In this post, we will be talking about how to make Elasticsearch(1.7.x) more stable and performant.

Before we start, you can see the difference between test results;

Before Tuning After Tuning
Successful calls 5000 5000
Total time 10.94 s 4.73 s
Average 1.92 s 0.76 s
Fastest 0.17 s 0.09 s
Slowest 4.95 s 2.74 s
RPS 450-500 1000-1100
Status codes
Code 200 4676 5000
Code 429 16 0
Code 503 307 0

Now we can start with tuning OS level settings which mentioned in ES documentations;

Configuring OS

Brief

First things first, let’s get OS(Ubuntu 14.04) ready. Elasticsearch requires only Java(>1.7). Newer ES versions may require higher version of java.

vm.swappiness

ES recommends to set this value 1, also according to Red Hat, a low swappiness value is recommended for database workloads. As an example, for Oracle databases, Red Hat recommended swappiness value is 10. For further reading Tuning Virtual Memory.

Why do we set this value to 1 instead of 0?

– Setting swappiness to 0 more aggressively avoids swapping out, which increases the risk of OOM killing under strong memory and I/O pressure.

net.core.somaxconn

Maximum number of connection an application can request.

vm.max_map_count

This property allows for the restriction of the number of VMAs (Virtual Memory Areas) that a particular process can own. When it reaches the limit, out of memory error will be thrown.

fs.file-max

Sets the maximum number of file-handles that the Linux kernel will allocate.

session required pam_limits.so

The pam_limits PAM module sets limits on the system resources that can be obtained in a user-session.

Configuring Elastic

Brief

bootstrap.mlockall: true

Tries to lock the process address space into RAM, preventing any Elasticsearch memory from being swapped out. This attribute provides JVM to lock its memory block and protects it from OS to swap this memory block. This is kind of performance optimization.

indices.fielddata.cache.size: 30%

Field data cache is unbounded. This, of course, could make your JVM heap explode.To avoid nasty surprises we limit this with 30%.(affects search performance)

indices.cache.filter.size: 30%

Even though filters are relatively small, they can take up large portions of the JVM heap if you have a lot of data and numerous different filters. So we limit this with 30%.

ES_HEAP_SIZE

The default installation of Elasticsearch is configured with a 1 GB heap. According to our long researches this value should be the half size of total RAM. Should not cross 30.5 GB!

http.compression: true

Support for compression when possible (with Accept-Encoding). Defaults to false. We use it with gzip. Made huge impact on performance.

discovery.zen.ping.multicast.enabled: false

discovery.zen.ping.unicast.hosts: [“host1”, “host2:port”]

Very important setting, it prevents clusters from complications. Newer versions come with default unicast.

discovery.zen.minimum_master_nodes: 2

This setting set according to this calculation (number of master-eligible nodes / 2) + 1.

action.destructive_requires_name:true

This setting prevents deleting index with wildcards *. Requires full name.

action.auto_create_index: false

you can prevent the automatic creation of indices by adding this setting to the config/elasticsearch.yml file on each node.

Zen settings

  • zen.ping.timeout: 10s
  • zen.fd.ping_retries: 3
  • zen.fd.ping_interval: 3s
  • zen.fd.ping_timeout: 30s

Set these settings to tolerate error rate and prevent undesired connection losses between nodes.

Implementation

Open the sysctl.conf;

vi /etc/sysctl.conf

Add these properties;

vm.swappiness=1                          # turn off swapping
net.core.somaxconn=65535                 # up the number of connections per port
vm.max_map_count=262144                  #(default) http://www.redhat.com/magazine/001nov04/features/vm
fs.file-max=518144                       # http://www.tldp.org/LDP/solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/chap6sec72.html

After that, go to the limits.conf;

vi /etc/security/limits.conf

The important thing is, which user is defined below. Our ES user should access these informations. It is recommended that using specific user for such big applications.(We did it in Redis too.) This user name is default when you installed the ES.

elasticsearch    soft    nofile          65535
elasticsearch    hard    nofile          65535
elasticsearch    -       memlock         unlimited

and to make these properties persistent you have to modify the;

We had Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (limit). error before making this change. But thanks to mrzard we got rid of this problem by setting this unlimited.

Just in case, soft limit can be temporarily exceeded by the user, but the system will not allow a user to exceed hard limit. We just go strict with this so we set both the same value.

vi /etc/pam.d/common-session-noninteractive
vi /etc/pam.d/common-session

Add this property;

session required pam_limits.so

You may need to reboot the machine to those changes to be applied.

Configuring Elasticsearch

Now everyting is ready for the Elasticsearch to be installed. You can use this bash script.Here you can get the script.

After that execute;

sh es.sh

You can choose whichever you want. We strongly recommend you to install Elasticsearch this way. If you download the tar.gz and go with that way, you have to create your init scripts and also you have to create Elasticsearch user which is very important to make configuration easier. Anyway, let’s assume you installed it with the script. Now you have elasticsearch.yml and logging.yml files under;

/etc/elasticsearch

In this part, let’s open the elasticsearch.yml. We only show you the places that need to be shown. All other settings are default.

vi /etc/elasticsearch/elasticsearch.yml
bootstrap.mlockall: true

action.auto_create_index: false 
action.destructive_requires_name: true
indices.fielddata.cache.size: 30%
indices.cache.filter.size: 30%

http.compression: true

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false 
discovery.zen.ping.unicast.hosts: ["ip-of-machine-1", "ip-of-machine-2", "ip-of-machine-3"]
discovery.zen.ping.timeout: 10s 
discovery.zen.fd.ping_retries: 3 
discovery.zen.fd.ping_interval: 3s 
discovery.zen.fd.ping_timeout: 30s

After that, let’s go to elasticsearch start script.

vi /etc/defaults/elasticsearch

One of the most important thing in ES, heap size. As much as we searched, mostly heap size should be half of total ram size and also should not be more than 30.5GB.

# Heap size defaults to 256m min, 1g max
# Set ES_HEAP_SIZE to 50% of available RAM, but no more than 31g
ES_HEAP_SIZE=4g

Finally, you can check the properties for our ES user;

su elasticsearch --shell /bin/bash --command "cat /proc/sys/vm/swappiness "
su elasticsearch --shell /bin/bash --command "cat /proc/sys/net/core/somaxconn"
su elasticsearch --shell /bin/bash --command "cat /proc/sys/vm/max_map_count "
su elasticsearch --shell /bin/bash --command "cat /proc/sys/fs/file-max "

su elasticsearch --shell /bin/bash --command "ulimit -n"
su elasticsearch --shell /bin/bash --command "ulimit -Sn"
su elasticsearch --shell /bin/bash --command "ulimit -Hn"

You can reboot the machines and check your cluster status from sense

GET /_nodes/process?pretty

or check every node from console

curl 'http://localhost:9200/?pretty'

If your nodes don’t start on startup, probably your init scripts did not installed properly. Use this command and reboot.

sudo update-rc.d elasticsearch defaults 95 10

If you get this exception org.elasticsearch.transport.RemoteTransportException check your nodes to know which version of java is installed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: