ElasticSearch single node cluster (2)

First part of this series here: https://juanmatiasdelacamara.wordpress.com/2019/06/27/elasticsearch-single-node-cluster-1/.

In the first part we started an ElasticSearch and a Kibana instances using Docker. In this second part we will scratch the surface of Kibana and play a little with field mapping.

We assume you have all the containers running from the first part. Just in case you don’t, here is a fast up and running:

docker network create -d bridge elastic

docker run -d --network elastic -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --name es1 docker.elastic.co/elasticsearch/elasticsearch:6.5.4docker network create -d bridge elastic

docker run -d --network elastic -p 5601:5601 -e ELASTICSEARCH_URL=http://es1:9200 --name K1 docker.elastic.co/kibana/kibana:6.5.4docker network create -d bridge elastic

docker run -d --network elastic -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --name es1 docker.elastic.co/elasticsearch/elasticsearch:6.5.4docker network create -d bridge elastic

The user-friendly way

That is… we will use Kibana…

Access your Kibana at:

http://your_ip:5601

Skip your welcome page (explore it your self) and you will be redirected to main page.

There, you must select Discover to search your data.

But then you will get this message: “In order to visualize and explore data in Kibana, you’ll need to create an index pattern to retrieve data from Elasticsearch.

As we have seen before we created indices and then indexed documents into them. Those indices “live” in Elasticsearch. Kibana needs a kind of “map” to understand how to use the index. This map is called “index pattern”.

The index pattern specifies fields and types among other things.

The index pattern creation is easy. Indeed Kibana redirected you to the index pattern creation page. But you can also access it following the menu Management->Kibana:Index Patterns.

You have a list of indices. You will see my_index and new_index, indices we have created before. And you must specify a “pattern” that match your index. In index pattern, type in: my_index.

Kibana will show you indices matching your expression. When you are done click Next Step.

You will see the message: “The indices which match this index pattern don’t contain any time fields.

This means that we didn’t add any timestamp field in our documents. Why it is important? Mostly to build time series, so it is very useful.

But at this point, it’s ok to have no such field in our documents. So click Create Index Pattern.

Field list will be shown. Click Discover and enjoy viewing your documents.

About patterns

Since we can have a lot of documents, we can create related indices to store them. E.g. create one index per day for a kind of document. Let say my_index-2018.01.01, my_index-2018.01.02 and so on.

All documents in those indices are related, meaning that they could have similar properties, e.g. date, source, server name (among others unique properties)… so all my indices will have the same (or almost the same) field mapping. Also, I’d like to query all these indices together since the documents are related.

So, we can create a pattern that covers all those indices (and the future ones) having them a regular name: my_index-YYY.MM.DD.

My pattern will be my_index-* using the * as a wildcard.

Field mapping

Ok, so let’s try this now. Insert this document:

curl -XPOST -H "Content-Type: application/json" "http://your_ip:9200/fields001/doc?pretty" -d '{"quote":"But man is not made for defeat. A man can be destroyed but not defeated.","number01":3,"number02":0.3,"date": "2018-12-21T15:33:49.000Z", "tags":["Ernest","Hemingway"],"address": "8.8.4.4"}'

And this one:

curl -XPOST -H "Content-Type: application/json" "http://your_ip:9200/fields001/doc?pretty" -d '{"quote":"The supreme art of war is to subdue the enemy without fighting.","number01":3.4,"number02":4.32,"date": "2018-12-21T15:34:03.000Z", "tags":["Sun","Tzu"],"address": "8.8.4.4"}'

Let’s check what we have inserted:

curl -XGET -H "Content-Type: application/json" "http://your_ip:9200/fields001/_search?pretty"

And we get this:

{
"took" : 5,
"timed_out" : false,
"_shards" : {
  "total" : 5,
  "successful" : 5,
  "skipped" : 0,
  "failed" : 0
},
"hits" : {
  "total" : 2,
  "max_score" : 1.0,
  "hits" : [
    {
"_index" : "fields001",
"_type" : "doc",
"_id" : "OQsC4WcBDcuyrpUoyte1",
"_score" : 1.0,
"_source" : {
"quote" : "But man is not made for defeat. A man can be destroyed but not defeated.",
"number01" : 3,
"number02" : 0.3,
"date" : "2018-12-21T15:33:49.000Z",
"tags" : [
  "Ernest",
  "Hemingway"
],
"address" : "8.8.4.4"
}
    },
    {
"_index" : "fields001",
"_type" : "doc",
"_id" : "OgsC4WcBDcuyrpUo5dfM",
"_score" : 1.0,
"_source" : {
"quote" : "The supreme art of war is to subdue the enemy without fighting.",
"number01" : 3.4,
"number02" : 4.32,
"date" : "2018-12-21T15:34:03.000Z",
"tags" : [
  "Sun",
  "Tzu"
],
"address" : "8.8.4.4"
}
    }
  ]
}
}

Things to note: tags was inserted as an array. So JSON’s array is managed correctly into an array.

But what type each field is? Let’s check it.

curl -XGET -H "Content-Type: application/json" "http://your_ip:9200/fields001/_mapping?pretty"

Here we are getting the mapping for index fields001.

{
"fields001" : {
  "mappings" : {
    "doc" : {
"properties" : {
"address" : {
  "type" : "text",
  "fields" : {
    "keyword" : {
      "type" : "keyword",
      "ignore_above" : 256
    }
  }
},
"date" : {
  "type" : "date"
},
"number01" : {
  "type" : "long"
},
"number02" : {
  "type" : "float"
},
"quote" : {
  "type" : "text",
  "fields" : {
    "keyword" : {
      "type" : "keyword",
      "ignore_above" : 256
    }
  }
},
"tags" : {
  "type" : "text",
  "fields" : {
    "keyword" : {
      "type" : "keyword",
      "ignore_above" : 256
    }
  }
}
}
    }
  }
}
}

The root object is named as our index. Then we have the mappings object, where all mappings live. A mapping will map a field to a type. (or more than one type, more on this soon)

Below mappings we get the doc_type. As we have said, current Elasticsearch versions only accepts one doc_type, in this case called doc.

And finally we have properties, you can understand it as fields (the properties in OO language). So below it we have the fields and its types. Just get the type field for everyone.

We have these types:

date: date
number01: long
number02: float
quote: text
tags: text
address: text

Ok, tags and quote are ok, they are text.

But number01 and number02 are different types despite the fact that both have numbers: 3, 0.3, 3.4 and 4.32. What happens here?

For number01 the first value sent to ES was 3, an integer, while number02 was 3.4, a float.

When ES received this document, since the fields didn’t exist, tried to guess type and created a map, long for number01 and float for number02.

When the second document arrives the types are already mapped.

A different case is for address, it should be an IP address but it is a text.

What’s happening?

ElasticSearch, when receiving a JSON document, creates a map from its properties. Each JSON’s property is mapped to a field in the index, with its own type.

Each time a new property is sent in an object a new map is added. Since objects (logs, documents, etc, all as JSON) can be so different, this approach is dangerous, a lot of fields can be added to the index. As a result of this you can have:

Too much fields indexed *
Too much memory used *
System slowed down *
Different data type set **

* This is known as index explosion.

** e.g. what we have seen with numbre01 and number02

So we need a solution: no-dynamic mapping.

(OK, all this sounds like pure evil and so on… but don’t worry, our consultants are paid by the hour) So, we have enough here to someway try to set manually what type each field has. Let’s try it.

We’d like to have these types:

date: date
number01: float
number02: float
quote: text
tags: text
address: IP address

So we will specify the mapping like this:

curl -XPUT -H "Content-Type: application/json" "http://your_ip:9200/_template/fields?pretty" -d '
{
"index_patterns" : "fields*",
"version" : 10001,
"settings" : {
  "index.refresh_interval" : "5s"
},
"mappings" : {
  "doc" : {
  "dynamic": "false",
  "properties": {
    "date": {
"type": "date"
    },
    "address": {
"type": "ip"
    },
    "number01": {
"type": "float"
    },
    "number02": {
"type": "float"
    },
    "quote": {
"type": "text"
    },
    "tags": {
"type": "text"
    }
  }
  }
}
}'

Look at the URL. We are calling _template action with method PUT. We are creating a new template.

After _template we are setting the template name: fields.

A template will map field names with its types.

With index_patterns we are specifying to what indices this template will be applied. In this case all future indices that match with expression “fields*”. Remember we have created an index called field001, this way this template will be valid for indices called fields002 or fields-2018.01.01 as well.

Then we set a version (it’s just a reference).

Then we have mappings and doc_type (remember that currently ES only accept one document type per index, so it will be unique), and finally dynamic and properties. The later contains the properties (fields) we are setting for our indices. Dynamic set to false will avoid ES automatic mapping attempt. (meaning that only those fields set in the template will be indexed, otherwise, ES will try to map fields not specified in the template)

Run the command and let’s delete the index and recreate it again.

curl -XDELETE -H "Content-Type: application/json" "http://your_ip:9200/fields001" 

curl -XPOST -H "Content-Type: application/json" "http://your_ip:9200/fields001/doc?pretty" -d '{"quote":"But man is not made for defeat. A man can be destroyed but not defeated.","number01":3,"number02":0.3,"date": "2018-12-21T15:33:49.000Z", "tags":["Ernest","Hemingway"],"address": "8.8.4.4"}'

curl -XPOST -H "Content-Type: application/json" "http://your_ip:9200/fields001/doc?pretty" -d '{"quote":"The supreme art of war is to subdue the enemy without fighting.","number01":3.4,"number02":4.32,"date": "2018-12-21T15:34:03.000Z", "tags":["Sun","Tzu"],"address": "8.8.4.4"}'

Now take a look to the mapping for this created index and you should get this:

{
"fields001" : {
  "mappings" : {
    "doc" : {
"dynamic" : "false",
"properties" : {
"address" : {
  "type" : "ip"
},
"date" : {
  "type" : "date"
},
"number01" : {
  "type" : "float"
},
"number02" : {
  "type" : "float"
},
"quote" : {
  "type" : "text"
},
"tags" : {
  "type" : "text"
}
}
    }
  }
}
}

The types we were looking for.

Now go to Kibana and look for Management->Index Patterns and create a pattern for this index.

In “index pattern” type “fields*” (without quotes).

This time, when you click Next Step, you will be prompted to choose a time filter field name, and there is our date field called date.

Create the index and go to Discover to watch our data. If you see no data please check the sample dates and use your current one.

More info on this: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

There are a lot more on mapping, but for now, this is enough.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Create a free website or blog at WordPress.com.

Up ↑

%d bloggers like this: