Multilingual queries in ElasticSearch
Let's say we have the following mapping in ElasticSearch.
{
"content": {
"properties": {
"id": {
"type": "string",
"index": "not_analyzed",
"store": "yes"
},
"locale_container": {
"type": "object",
"properties": {
"english": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "english",
"search_analyzer": "english",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "english",
"search_analyzer": "english",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"german": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "german",
"search_analyzer": "german",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "german",
"search_analyzer": "german",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"russian": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "russian",
"search_analyzer": "russian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {开发者_C百科
"type": "string",
"index_analyzer": "russian",
"search_analyzer": "russian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"italian": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "italian",
"search_analyzer": "italian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "italian",
"search_analyzer": "italian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
}
}
}
}
}
}
When a particular user queries the index, we can take her culture from her settings, i.e. we know which analyzer to use. How can we formulate a query which will search only "title" and "text" fields in her own language (let's say, German) and use German analyzer to tokenize the search query?
I've simplified the example to use standard
analyzer for 'English' and simple
(no stopping) for 'French'. For document like this:
{
id: "abc",
locale_container: {
english: {
title: "abc to ABC",
text: ""
},
french: {
title: "def to DEF",
text: ""
}
}
}
The following queries do the trick:
locale_container.english.title:abc
-> returns the documentlocale_container.french.title:def
-> returns the document as welllocale_container.english.title:to
-> doesn't return anything, since 'to' is a stopwordlocale_container.french.title:to
-> returns the document
精彩评论