Short Introduction into elastic (search) - Part 1

Quick start with elastic 2.0

After a long time with Apache Solr, i also did a few projects with elastic search. The elastic homepage offers a could introduction and in most parts also a good documentation of elastic search.


But when you want to use elastic efficient, there is also a range of community tools and some tricks that could not be found in the official documentation.


Install elastic and start a few nodes

Elastic Search - Node - Index - Primary Shards - Replica Shard

Elastic tried to make the start with it very very simple and this has been achieved in many parts. Before we fire up our first cluster, there a some theoretical basics that we should understand:


Cluster & Nodes

A cluster consists of one or multiple nodes. A node is one elatic process. This node (or process) can run once ore multiple times on one physical machine. With the zen.discovery protocol nodes discover other nodes of the same cluster automatically. This is handy for getting started, but it sometimes makes sence to turn off this feature in production, more about this later. 


Indexes & Shards

One a cluster you can create multiple Indexes. An Index could be compared with a database table. It is a bucket where documents could be stored. Elastic takes care about the distribution of these documents to the single cluster nodes. An index is splitted logically into so called "shards". These shards are used to distribute the data. A document is saved in one of these shards.

To have high availability a shard could be replicated to 1 to N nodes. The shards, where the document is assigned to it the "primary shard". A copy of this shard is called "replica shard".


When a cluster node go's down, for each primary shards of this node a replica shard on another node will be elected to take the master role. With this fault tollerance mechanism elastic is very robust when it is configured correctly.


Enough theory for now, let's start our first cluster with three nodes. To do this, we use the same local physical machine.


Starting elastic search nodes is very simple and can be done like this:


# download elastic
tar xvfz elasticsearch-2.0.0.tar.gz
cd elasticsearch-2.0.0

#start elastic cluster node 1
./bin/elasticsearch &

#start elastic cluster node 2
./bin/elasticsearch &

#start elastic cluster node 3
./bin/elasticsearch &

The Cluster State

We started three elastic nodes, but how can we see if they are really running? The simples way is to use the cluster health API:


curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
  "cluster_name" : "elasticsearch",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0

This request handler gives you information about, how healthy your cluster is. The status will be represented with the color "green", "yellow" or "red".

The colors have the following meaning:

  • Green: The cluster is healthy and operating. All shards are writeable and have the configured amount of replica's

  • Yellow: For every shard there is a node that contains data, but the configured amount of replicas is not available. This status can be reached, when a node goes down and a replica shard was promoted to be the primary shard. When this is happening the shard will be replicated to another node to have the configured amounts of replicas available again. During the allocation of a new replica the cluster state keeps yellow.

  • Rot: The cluster is not writeable. There are shards that have no primary and no replica shard.


Important: The worse state of a shard defines the cluster state. One unwriteable shard is enought to get a red cluster state.


As you might recognize before, the  RestAPI of elastic are very powerfull. Every parameter of elastic can be controlled and retrieved with the rest api what is usefull for automation. But writting queries only with curl and rember every api path is impossible. A good plugin to control elastic with the browser is kopf. It can be used to browse and maintain indices, see and configure cluster nodes and also to play with elastic queries.

Get an Overview about the cluster with the elastic kopf plugin

When you want to get an overview about the cluster in the browser or test queries directly within the browser, you can use the kopf plugin.

It can be installed with the following command:


cd elasticdir
./bin/plugin install

You can open the plugin in the browser with the following url scheme:


Elasticsearch cluster mit kopf managed
Elasticsearch Cluster in Kopf