Elasticsearch

The story so far

Jordy Moos & Frank Koornstra

600,000,000 Visitors

1,470,000,000 Clicks

2,400,000,000 Page views

Sites

"Items"

Insights

Percolate

Hardware

Final thoughts

For "dummies"

The tricky tweaky token tackler

  • The
  • tricky
  • tweaky
  • token
  • tackler

  • The → 1, 2, ...
  • tricky → 1, 5, ...
  • tweaky → 1, 42, ...
  • token → 1
  • tackler → 1, 8, ...

http://localhost:9200/the-index/the-type

Shards

Config

Index settings

curl -XPUT 'localhost:9200/the-index' -d '{
  ....
  "analyzer": {
    "default_analyzer": {
      "type": "custom",
      "tokenizer": "icu_tokenizer",
      "filter": [
        "icu_normalizer",
        "elision",
        "apostrophe",
        "stop",
        "porter_stem"
      ],
      "char_filter": []
    },
    "raw_analyzer": {
      ...
      "filter": [
        "icu_normalizer",
        "elision",
        "apostrophe"
      ]
    }
  }
  ...
}

I thought it wasn't real.
I thought it wasn't real


βeta
βeta (standard)
β eta (ICU)

ICU normalizer ÅlesUND → ålesund


Elision: l'avion → avion


Apostrophe: Neo's → Neo


Stop: the is and or ...


Stemmer: turn, turned, turns → turn


curl -XGET 'localhost:9200/_analyze' -d '
{
  "analyzer" : "default_analyzer",
  "text" : "everything is awesome!"
}'
            

Document mapping

curl -XPUT 'localhost:9200/the-index/the-type/_mapping' -d '{
  "the-type": {
    "properties": {
      "name": {
        "type": "string",
        "fields": {
          "analyzed": {
            "type": "string",
            "analyzer": "default_analyzer"
          },
          "raw": {
            "type": "string",
            "analyzer": "raw_analyzer"
          }
        }
      }
    }
  }
}

curl -XGET 'localhost:9200/_analyze' -d '
{
  "analyzer" : "....",
  "text" : "everything is awesome!"
}'
            
{
  "tokens": [
    {
      "token": "everything",
      "start_offset": 0,
      "end_offset": 10,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "is",
      "start_offset": 11,
      "end_offset": 13,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "awesome",
      "start_offset": 14,
      "end_offset": 21,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}
{
  "tokens": [
    {
      "token": "everyth",
      "start_offset": 0,
      "end_offset": 10,
      "type": "	<ALPHANUM>",
      "position": 0
    },
    {
      "token": "awesom",
      "start_offset": 14,
      "end_offset": 21,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}

Document

David Hasselhoff is drunk!

  • david
  • hasselhoff
  • drunk

Search

David Bowie is awesome!

  • david
  • bowi
  • awesom

Synonyms

"analysis": {
  "filter": [
    "artist_synonym_filter": {
      "type": "synonym",
      "synonyms": [
        "david bowie => david_bowie"
      ]
    }
  ],
  "analyzer": [
    "default_analyzer": {
      "type": "custom",
      "tokenizer": "icu_tokenizer",
      "filter": [
        "icu_normalizer",
        "elision",
        "apostrophe",
        "artist_synonym_filter",
        "porter_stem",
        "stop"
      ]
    }
  ]
}
curl -XGET 'localhost:9200/_analyze' -d '
{
  "analyzer" : "default_analyzer",
  "text" : "David Bowie is awesome!"
}'
{
  "tokens": [
    {
      "token": "david_bowi",
      "start_offset": 0,
      "end_offset": 11,
      "type": "SYNONYM",
      "position": 0
    },
    {
      "token": "awesom",
      "start_offset": 15,
      "end_offset": 22,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}

Autocomplete

"analysis": {
  "filter": {
   "autocomplete_filter": {
      "type": "edge_ngram",
      "min_gram": 1,
      "max_gram": 10
    }
  },
  "analyzer": {
    "autocomplete_index_analyzer": {
      "type": "custom",
      "tokenizer": "keyword",
      "filter": [
        "icu_normalizer",
        "elision",
        "apostrophe",
        "autocomplete_filter"
      ]
    }
  }
}

His plethora of knowledge

"tokens": [
  {
    "token": "h",
    "start_offset": 0,
    "end_offset": 25,
    "type": "word",
    "position": 0
  },
  {
    "token": "hi",
    "start_offset": 0,
    "end_offset": 25,
    "type": "word",
    "position": 0
  },
  ...
  {
    "token": "his pl",
    "start_offset": 0,
    "end_offset": 25,
    "type": "word",
    "position": 0
  },
  ...

Document

His plethora of knowledge

  • h
  • hi
  • his
  • ...

Search

heimlich maneuver → heim

  • h
  • he
  • hei
  • heim

Seperate analyzers (settings)

"analysis": {
  "analyzer": {
    "autocomplete_index_analyzer": {
      "type": "custom",
      "tokenizer": "keyword",
      "filter": [
        "icu_normalizer",
        "elision",
        "apostrophe",
        "autocomplete_filter"
      ]
    },
    "autocomplete_search_analyzer": {
      "type": "custom",
      "tokenizer": "icu_tokenizer",
      "filter": [
        "icu_normalizer",
        "elision",
        "apostrophe"
      ]
    }
  }
}

Separate analyzers (mapping)

{
  "autocomplete_index": {
    "properties": {
      "name": {
        "type": "string",
        "fields": {
          "autocomplete": {
            "type": "string",
            "analyzer": "autocomplete_index_analyzer",
            "search_analyzer": "autocomplete_search_analyzer"
          }
        }
      }
    }
  }
}

Document

His plethora of knowledge

  • h
  • hi
  • his
  • ...

Search

heimlich maneuver → heim

  • heim

Search

Search

Percolate

Percolate

Percolator query

curl -XPUT 'localhost:9200/index/.percolator/1' -d '{
    "query" : {
        "match" : {
            "message" : "everything is awesome"
        }
    }
}'

Percolator document


curl -XGET 'localhost:9200/message-index/messages/_percolate' -d '{
    "doc" : {
        "message" : "David bowie is awesome!"
    }
}'

curl -XGET 'localhost:9200/message-index/message/1/_percolate'
        

curl -XGET 'localhost:9200/message-index/message/_mpercolate' -d
'{"percolate" : {"index" : "message-index", "type" : "messages"}}
{"doc" : {"message" : "speed this up will ya?"}}
{"percolate" : {"index" : "message-index", "type" : "messages"}}
{"doc" : {"message" : "can this thing go any faster?!?"}}
...
'
        

curl -XGET 'localhost:9200/message-index/message/_mpercolate' -d
'{"percolate" : {"index" : "message-index", "type" : "messages", "id": 1}}
{}
{"percolate" : {"index" : "message-index", "type" : "messages", "id": 2}}
{}
...
'
        

vs

Disks

  • SSD

  • Noop / Deadline

  • RAID 0

  • nofile 65535

CPU

  • Indexing Searching

  • Cores speed

  • 4~8 cores

Network

  • Single data center

  • Low latency

  • High bandwidth (1~10 GbE)

Memory

  • Yes please!

  • ~64GB

  • Prevent heap resize

  • Disable swap

Max 30.5 GB ~ 50%

Shards

"settings": {
  "index": {
    ...
    "number_of_shards": "3",
    "number_of_replicas": "1",
    ...
  }
}
Node 1
  • P0
  • P1
Node 2
  • P2
  •  
Node 1
  • P0
  • R1
Node 2
  • R0
  • P1
Node 3
  • R0
  • R1

RE⋅IN⋅DEX