how should I think about search engine indices?
I am using elastic search and do not understand exactly what an index is. For example, if I have 3 models (a backpack, a shoe and a glove), do I put each model in its own index or do I index attributes of each model: ie I index a shoe's laces, its sole, etc?
I am trying to understand if it is slow to search across indices. For example, if I index each attribute of my models and I have say, 20 indices, when I run a search that needs to look at d开发者_开发问答ata in all of the indices, is this slower than having a single index and looking at 20 attributes stored in that index?
In Elasticsearch, an index consists of one or more primary shards, where a shard is a Lucene instance. Each primary shard can have zero or more replicas, whose existence gives you high availability and increased search performance.
A single shard can hold a lot of data. However, with multiple shards it is easier to distribute the workload across multiple processors and multiple servers.
That said, you need a balance. The right number of shards depends on your data and context. Shards aren't free, so while it is useful to have thousands of shards if you're running a 100 node cluster, you don't want that on a single node.
In Elasticsearch, as well as having indices, you have the concept of types. Think of an index as being like a database, and a type being like a table.
Using different types has no overhead, and fits better with your example than having separate indices.
You can still search across all types (or a selected list of types) and across all indices (or a selected list) or any combination.
Each type can have its own fields (like the columns in a table) .
So in your example, I'd have one index containing 3 types, each with its own fields. Start with default number of primary shards (5) and the default number of replicas (1) and change these only when you understand your data better.
Note: don't confuse an index in Elasticsearch with an index in a database
精彩评论