After a long time with Apache Solr, i also did a few projects with elastic search. The elastic homepage offers a could introduction and in most parts also a good documentation of elastic search.
But when you want to use elastic efficient, there is also a range of community tools and some tricks that could not be found in the official documentation.
Elastic tried to make the start with it very very simple and this has been achieved in many parts. Before we fire up our first cluster, there a some theoretical basics that we should understand:
Cluster & Nodes
A cluster consists of one or multiple nodes. A node is one elatic process. This node (or process) can run once ore multiple times on one physical machine. With the zen.discovery protocol nodes discover other nodes of the same cluster automatically. This is handy for getting started, but it sometimes makes sence to turn off this feature in production, more about this later.
Indexes & Shards
One a cluster you can create multiple Indexes. An Index could be compared with a database table. It is a bucket where documents could be stored. Elastic takes care about the distribution of these documents to the single cluster nodes. An index is splitted logically into so called "shards". These shards are used to distribute the data. A document is saved in one of these shards.
To have high availability a shard could be replicated to 1 to N nodes. The shards, where the document is assigned to it the "primary shard". A copy of this shard is called "replica shard".
When a cluster node go's down, for each primary shards of this node a replica shard on another node will be elected to take the master role. With this fault tollerance mechanism elastic is very robust when it is configured correctly.
Enough theory for now, let's start our first cluster with three nodes. To do this, we use the same local physical machine.
Starting elastic search nodes is very simple and can be done like this:
We started three elastic nodes, but how can we see if they are really running? The simples way is to use the cluster health API:
This request handler gives you information about, how healthy your cluster is. The status will be represented with the color "green", "yellow" or "red".
The colors have the following meaning:
Important: The worse state of a shard defines the cluster state. One unwriteable shard is enought to get a red cluster state.
As you might recognize before, the RestAPI of elastic are very powerfull. Every parameter of elastic can be controlled and retrieved with the rest api what is usefull for automation. But writting queries only with curl and rember every api path is impossible. A good plugin to control elastic with the browser is kopf. It can be used to browse and maintain indices, see and configure cluster nodes and also to play with elastic queries.
When you want to get an overview about the cluster in the browser or test queries directly within the browser, you can use the kopf plugin.
It can be installed with the following command:
You can open the plugin in the browser with the following url scheme: