ElasticSearch single node cluster (1)

Abstract

Do you want to learn some stuff about Elastic Stack? We will dive into Elastic Stack while setting up a single node cluster of ES using Docker.

I won’t go too deep in any issue, instead I will scratch the surface and let you go deeper in items of your choice.

Prerequisites

It’s desirable you know Docker and command line.

Also it is a good idea to read the Terminology used across the stack.

Download images

docker pull docker.elastic.co/elasticsearch/elasticsearch:6.5.4
docker pull docker.elastic.co/kibana/kibana:6.5.4

Preparing the field

To allow Kibana to access directly ES we will create a docker network:

docker network create -d bridge elastic

Run images

This is the basic, so you can feel what Elasticsearch is. Later we will set more options.

docker run -d --network elastic -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --name es1 docker.elastic.co/elasticsearch/elasticsearch:6.5.4

To check the startup process get the logs:

docker logs -f es1

We set ES to listen on port 9200, so you can check it in the following way:

curl -XGET http://your_ip:9200

If all went well you must get this:

{
  "name" : "teMJMh-",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "2hgWcIiURJqUghZi-YewHw",
  "version" : {
    "number" : "6.5.4",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "d2ef93d",
    "build_date" : "2018-12-17T21:17:40.758843Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

We can work with ES through this API, but let’s try Kibana.

docker run -d --network elastic -p 5601:5601 -e ELASTICSEARCH_URL=http://es1:9200 --name K1 docker.elastic.co/kibana/kibana:6.5.4

Let’s check the logs:

docker logs -f K1

We’re using es1 as ELASTICSEARCH_URL since we have created ES container with that name and attached both containers to the same docker network.

Once this is done we can access Kibana on this URL:

http://your_ip:5601

Voilà, Kibana is working. (hope so)

But we can’t find any data…. because there is no data at all…. so, let’s put some data to try it.

Insert some data to play

We can insert data in various ways, but all of them rely on the ES API. So to play a little with data here we will use the API from the command line.

Index creation

Documents in ES are indexed in a series of indices. ES stores JSON Documents in its no-sql document oriented database. These documents are then indexed into indices, meaning some fields (not necessarily all of them) are indexed into an index. But, for a field to be searchable it must be indexed. There are a few ways of index a field that will be described later.

So firstly we will create an index.

curl -XPUT "http://your_ip:9200/my_index?pretty"

This will create my_index index and respond this:

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my_index"
}

If ?pretty is not set the the json reponse will be stringificated.

Note that you are using verb PUT to create the index. This way using DELETE you can delete the the index.

curl -XDELETE "http://your_ip:9200/my_index?pretty"

This will delete index called my_index.

In general the API has this form:

curl -Xverb "your_url:your_port/the_index_you_want_to_work_on?pretty"

So, if you run this command:

curl -XGET "http://your_ip:9200/my_index?pretty"

You will get some interesting data about your newly created index:

{
  "my_index" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index" : {
    "creation_date" : "1545316922565",
    "number_of_shards" : "5",
    "number_of_replicas" : "1",
    "uuid" : "hoAGq0tSTg66yWeu-tu8og",
    "version" : {
      "created" : "6050499"
    },
    "provided_name" : "my_index"
      }
    }
  }
}

Later on this workshop we will revise this data.

By now you can see some info about this index in Kibana:

http://your_ip:5601/app/kibana#/management/elasticsearch/index_management/home?_g=()

Data insertion

Now let’s insert some data. Here we will insert a single document in our index:

curl -XPOST -H "Content-Type: application/json" "http://your_ip:9200/my_index/doc?pretty" -d '{"type":"human","name":"john","moto":"johnny b good"}'

If all went well this is your output:

{
  "_index" : "my_index",
  "_type" : "doc",
  "_id" : "C9kYzGcB8jw2faw1k-JG",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

We are using POST verb, setting content type to json and this API url:

your_url:your_port/your_index/your_doc_type?pretty

…and sending our doc as a json. Doc_type is doc (indices only support set one doc_type).

Query for your document:

curl -XGET -H "Content-Type: application/json" "http://your_ip:9200/my_index/_search?pretty"

And this is the output:

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
    "_index" : "my_index",
    "_type" : "doc",
    "_id" : "C9kYzGcB8jw2faw1k-JG",
    "_score" : 1.0,
    "_source" : {
      "type" : "human",
      "name" : "john",
      "moto" : "johnny b good"
    }
      }
    ]
  }
}

You can see a property hits with total = 1 (our only doc), and a subproperty hits, an array with our documents.
In the document you can see some interesting data. The _index in which document was indexed, the _source (our actual data), the _id (that was generated automatically for us).

Bulk action

But usually we won’t insert single documents. Instead we will work with a payload with more than one doc. So this is the way to work with a documents batch.

curl -XPOST -H "Content-Type: application/x-ndjson" "http://your_ip:9200/my_index/_bulk?pretty" -d '
{ "index" : { "_index" : "my_index", "_type" : "doc" } }
{"type":"human","name":"lita de lasari","moto":"walk ma am"}
{ "index" : { "_index" : "my_index", "_type" : "doc" } }
{"type":"human","name":"macri cat","moto":"things happens"}
{ "index" : { "_index" : "my_index", "_type" : "doc" } }
{"type":"human","name":"judas","moto":"give me a kiss"}
'

Here we are inserting three elements in index my_index, let search for them.

Note this:

  • content type is x-ndjson
  • the url is the same than before but ending with _bulk
  • we are not using in the url doc_type (doc) due to ES will use the default (we can use it anyway, it’s optional)
  • data is split by newline chars with two lines per item:
    • action (index here, but can be update, create, delete…)
    • the item itself
  • we are using index name in url but we are setting it in the item action *

* if index is set in URL it becomes the default. If no index is set in action then ES uses the default for that item. Same for doc type. You can use “index”: {} as action and default will be used, or you can mix indices inserting each item in different indices. E.g.:

curl -XPOST -H "Content-Type: application/x-ndjson" "http://your_ip:9200/my_index/doc/_bulk?pretty" -d '
{ "index" : { } }
{"type":"rat","name":"splinter","moto":"I hate pizza"}
{ "index" : { "_index" : "my_index" } }
{"type":"unkown","name":"alf","moto":"I love cats"}
{ "index" : { "_index" : "new_index", "_type" : "doc" } }
{"type":"hobbit","name":"bilbo","moto":"nice ring"}
'

We added doc_type in url to set a default.

First item uses the default.

Second one set index but no doc_type.

Third one specifies a new index and new type for it.

Search them:

curl -XGET -H "Content-Type: application/json" "http://your_ip:9200/my_index/_search?pretty"

We have inserted 7 items so far. But my search hits.total is 6. This is because we are searching my_search index. Remember last item was indexed in new_index, so let search for it.

curl -XGET -H "Content-Type: application/json" "http://your_ip:9200/new_index/_search?pretty"

And the magic happened, the document is there.

IDs

IDs are created for us. But we can specify them.

Single item:

curl -XPOST -H "Content-Type: application/json" "http://your_ip:9200/my_index/doc/1?pretty" -d '{"type":"human","name":"hannibal","moto":"I love it when a plan comes together"}'

Note the url, after doc_type we added the id, in this case “1”.

The output is:

{
  "_index" : "my_index",
  "_type" : "doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

Note _id field.

For _bulk actions id can be specified in the action part of the item:

{ "index" : { "_index" : "new_index", "_type" : "doc", "_id": "1" } }

Keep in mind, if id already exists:

- index will replace doc (create a new version)
- create will fail

This story will continue…

Advertisement

One thought on “ElasticSearch single node cluster (1)

Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.

Up ↑

%d bloggers like this: