When is a Bloom filter useful?
I understand what makes bloom filters an attractive data structure; however, I'm finding it difficult to really understand when you can use them since you still have to perform the expensive operation you're trying to avoid to be certain that you haven't found a false positive. Because of this wouldn't they generally just add a lot of overhead? For example the wikipedia article for bloom filters suggests they can be used for data synchronization. I see how it woul开发者_JS百科d be great the first time around when the bloom filter is empty but say you haven't changed anything and you go to synchronize your data again. Now every lookup to the bloom filter will report that the file has already been copied but wouldn't we still have to preform the slower lookup task we were trying to avoid to actually make sure that's correct?
Basically, you use Bloom filters to avoid the long and arduous task of proving an item doesn't exist in the data structure. It's almost always harder to determine if something is missing than if it exists, so the filter helps to shore up losses searching for things you won't find anyway. It doesn't always work, but when it does you reap a huge benefit.
Bloom filters are very efficient in the case of membership queries, i.e., to find out whether an element belongs to the set. The number of elements in the set does not affect the query performance.
A common example is when you add an email address to the email list. your application should check whether it’s already in the contacts list, and if it’s not, a popup should appear asking you if you want to add the new recipient. To Implement this, you normally follow those steps in front end application:
- Get the list of contacts from a server
- Create a local copy for fast lookup
- Allow looking up a contact
- Provide the option to add a new contact if the lookup is unsuccessful
- Sync with the server when a new contact is added or an existing email is updated.
Bloom filter will handle all those steps fast and memory-efficient way. You could use a dictionary for a fast lookup but that would require to save the entire contact list in key-pair. For such a large contact-list you might not have enough storage space in browser
精彩评论