Jordy Moos & Frank Koornstra
Sites
"Items"
The tricky tweaky token tackler
http://localhost:9200/the-index/the-type
Shards
Config
curl -XPUT 'localhost:9200/the-index' -d '{
....
"analyzer": {
"default_analyzer": {
"type": "custom",
"tokenizer": "icu_tokenizer",
"filter": [
"icu_normalizer",
"elision",
"apostrophe",
"stop",
"porter_stem"
],
"char_filter": []
},
"raw_analyzer": {
...
"filter": [
"icu_normalizer",
"elision",
"apostrophe"
]
}
}
...
}
I thought it wasn't real.
I
thought
it
wasn't
real
βeta
βeta (standard)
β
eta (ICU)
ICU normalizer ÅlesUND → ålesund
Elision: l'avion → avion
Apostrophe: Neo's → Neo
Stop:
the is and or ...
Stemmer: turn, turned, turns → turn
curl -XGET 'localhost:9200/_analyze' -d '
{
"analyzer" : "default_analyzer",
"text" : "everything is awesome!"
}'
curl -XPUT 'localhost:9200/the-index/the-type/_mapping' -d '{
"the-type": {
"properties": {
"name": {
"type": "string",
"fields": {
"analyzed": {
"type": "string",
"analyzer": "default_analyzer"
},
"raw": {
"type": "string",
"analyzer": "raw_analyzer"
}
}
}
}
}
}
curl -XGET 'localhost:9200/_analyze' -d '
{
"analyzer" : "....",
"text" : "everything is awesome!"
}'
{
"tokens": [
{
"token": "everything",
"start_offset": 0,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "is",
"start_offset": 11,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "awesome",
"start_offset": 14,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 2
}
]
}
{
"tokens": [
{
"token": "everyth",
"start_offset": 0,
"end_offset": 10,
"type": " <ALPHANUM>",
"position": 0
},
{
"token": "awesom",
"start_offset": 14,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 2
}
]
}
David Hasselhoff is drunk!
David Bowie is awesome!
"analysis": {
"filter": [
"artist_synonym_filter": {
"type": "synonym",
"synonyms": [
"david bowie => david_bowie"
]
}
],
"analyzer": [
"default_analyzer": {
"type": "custom",
"tokenizer": "icu_tokenizer",
"filter": [
"icu_normalizer",
"elision",
"apostrophe",
"artist_synonym_filter",
"porter_stem",
"stop"
]
}
]
}
curl -XGET 'localhost:9200/_analyze' -d '
{
"analyzer" : "default_analyzer",
"text" : "David Bowie is awesome!"
}'
{
"tokens": [
{
"token": "david_bowi",
"start_offset": 0,
"end_offset": 11,
"type": "SYNONYM",
"position": 0
},
{
"token": "awesom",
"start_offset": 15,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 2
}
]
}
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10
}
},
"analyzer": {
"autocomplete_index_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"icu_normalizer",
"elision",
"apostrophe",
"autocomplete_filter"
]
}
}
}
"tokens": [
{
"token": "h",
"start_offset": 0,
"end_offset": 25,
"type": "word",
"position": 0
},
{
"token": "hi",
"start_offset": 0,
"end_offset": 25,
"type": "word",
"position": 0
},
...
{
"token": "his pl",
"start_offset": 0,
"end_offset": 25,
"type": "word",
"position": 0
},
...
His plethora of knowledge
heimlich maneuver → heim
"analysis": {
"analyzer": {
"autocomplete_index_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"icu_normalizer",
"elision",
"apostrophe",
"autocomplete_filter"
]
},
"autocomplete_search_analyzer": {
"type": "custom",
"tokenizer": "icu_tokenizer",
"filter": [
"icu_normalizer",
"elision",
"apostrophe"
]
}
}
}
{
"autocomplete_index": {
"properties": {
"name": {
"type": "string",
"fields": {
"autocomplete": {
"type": "string",
"analyzer": "autocomplete_index_analyzer",
"search_analyzer": "autocomplete_search_analyzer"
}
}
}
}
}
}
His plethora of knowledge
heimlich maneuver → heim
|
→ |
|
|
→ |
|
|
→ |
|
|
→ |
|
curl -XPUT 'localhost:9200/index/.percolator/1' -d '{
"query" : {
"match" : {
"message" : "everything is awesome"
}
}
}'
curl -XGET 'localhost:9200/message-index/messages/_percolate' -d '{
"doc" : {
"message" : "David bowie is awesome!"
}
}'
curl -XGET 'localhost:9200/message-index/message/1/_percolate'
curl -XGET 'localhost:9200/message-index/message/_mpercolate' -d
'{"percolate" : {"index" : "message-index", "type" : "messages"}}
{"doc" : {"message" : "speed this up will ya?"}}
{"percolate" : {"index" : "message-index", "type" : "messages"}}
{"doc" : {"message" : "can this thing go any faster?!?"}}
...
'
curl -XGET 'localhost:9200/message-index/message/_mpercolate' -d
'{"percolate" : {"index" : "message-index", "type" : "messages", "id": 1}}
{}
{"percolate" : {"index" : "message-index", "type" : "messages", "id": 2}}
{}
...
'
vs
SSD
Noop / Deadline
RAID 0
nofile 65535
Indexing Searching
Cores speed
4~8 cores
Single data center
Low latency
High bandwidth (1~10 GbE)
Yes please!
~64GB
Prevent heap resize
Disable swap
![]() |
|
![]() |
![]() |
"settings": {
"index": {
...
"number_of_shards": "3",
"number_of_replicas": "1",
...
}
}
Node 1
|
Node 2
|
Node 1
|
Node 2
|
Node 3
|