Efficient search for address book
Let's say I have first name, last name and email on an object (call AddressBook). What data structure I use (in Java) to store this object such that I need to search by last name that starts with certain letters. For example, if last name is John or Johnson and I query by John, 开发者_运维百科it should return all AddressBook objects with last names starting with John.
What is the most efficient way for this?
Let's say there are 10 people with last name X, then I can put this X as a key to the map with value a list containing 10 AddressBook objects.
Here last name may be X, X1, X12, X13, XA. Need help on this.
Take a trie: every node will be associated with the set of persons whose last name is starting with string contained in that node.
[UPDATE] Even better take a look at Patricia Trie. Or even better take this code as an example of trie which allows to mark nodes with wildcard "*" at the end of prefix, so you can lookup things like "John*":
/**
* Tatiana trie -- variant of Patricia trie
* (http://en.wikipedia.org/wiki/Patricia_tree, http://en.wikipedia.org/wiki/Trie)
* in which edge associated not with substring, but with regular expressions.
* <b>Every child node RE defines match-set which is proper subset of parent node ER's match-set.</b>
* <b>Null keys aren't permitted</b>
* <p/>
* Following wildcards <b>at the end</b> of RE are accepted:
* * -- any string of length >= 0
* <p/>
* Examples of valid RE: <pre>a, abra*, </pre>
* Example of invalid RE: <pre>a*a</pre>
* <p/>
*/
public class TatianaTree<T> {
private final Node<T> root;
/**
* Creates tree with <code>null</code> associated with root node.
*/
public TatianaTree() {
this(null);
}
/**
* Creates tree with given element associated with root node.
*
* @param el element to associate with root
*/
public TatianaTree(T el) {
root = new Node<T>(null, el);
}
public TatianaTree<T> add(Node<T> node) {
if (null == node.key)
throw new IllegalArgumentException("Can't add node with null key");
root.add(node);
return this;
}
public Node<T> findNode(String key) {
return root.findNode(Node.normalize(key));
}
/**
* Removes most-specific node which matches given key.
*
* @return element of type <code>T</code> associated with deleted node or <code>null</code>.
* The only case when <code>null</code> will be returned is when root node corresponds to given key.
*/
public T remove(String key) {
Node<T> node = findNode(key);
if (root == node)
return null;
node.removeSelf();
return node.el;
}
public Node<T> getRoot() {
return root;
}
public static class Node<T> {
private static final String INTERNAL_ROOT_KEY = ".*";
private static final String ROOT_KEY = "*";
private static final Pattern KEY_WRONG_FORMAT_PATTERN = Pattern.compile("\\*.+");
private static final String ROOT_KEY_IN_NODE_MSG = "Can't add non-root node with root key";
private static final String WRONG_KEY_FORMAT_MSG = "Valid format is ^[A-Za-z0-9_]+(\\*){0,1}$";
private final String key;
private final T el;
private final List<Node<T>> children = new ArrayList<Node<T>>();
private Node<T> parent;
public Node(String key) {
this(key, null);
}
public Node(String key, T el) {
String k = INTERNAL_ROOT_KEY;
if (null != key) {
k = normalize(key);
}
this.key = k;
this.el = el;
this.parent = null;
}
/**
* Subset-check function.
*
* @param s string to check
* @param base string to check against
* @return <code>true</code> if base is superset of s, <code>false</code> otherwise
*/
private boolean isSubset(String s, String base) {
String shortestS = s.replaceFirst("\\*$", "");
String baseRE = "^" + base;
Pattern p = Pattern.compile(baseRE);
return p.matcher(shortestS).matches();
}
public T getEl() {
return el;
}
private void add(Node<T> node) {
boolean addHere = true;
for (Node<T> child : children) {
if (isSubset(child.key, node.key)) {
insertAbove(node);
addHere = false;
break;
} else if (isSubset(node.key, child.key)) {
child.add(node);
addHere = false;
break;
}
}
if (addHere) {
children.add(node);
node.parent = this;
}
}
private void insertAbove(Node<T> newSibling) {
List<Node<T>> thisChildren = new ArrayList<Node<T>>(),
newNodeChildren = new ArrayList<Node<T>>();
for (Node<T> child : children) {
if (isSubset(child.key, newSibling.key)) {
newNodeChildren.add(child);
child.parent = newSibling;
} else {
thisChildren.add(child);
}
}
newSibling.children.clear();
newSibling.children.addAll(newNodeChildren);
this.children.clear();
this.children.addAll(thisChildren);
this.children.add(newSibling);
newSibling.parent = this;
}
private Node<T> findNode(String key) {
for (Node<T> child : children) {
if (isSubset(key, child.key))
return child.findNode(key);
}
return this;
}
public int getChildrenCount() {
return children.size();
}
private static String normalize(String k) {
if (ROOT_KEY.equals(k))
throw new IllegalArgumentException(ROOT_KEY_IN_NODE_MSG);
k = k.replaceFirst("\\*$", ".*").replaceAll("\\[", "\\\\[").replaceAll("\\]", "\\\\]");
Matcher m = KEY_WRONG_FORMAT_PATTERN.matcher(k);
if (m.find())
throw new IllegalArgumentException(WRONG_KEY_FORMAT_MSG);
return k;
}
private void removeSelf() {
parent.children.remove(this);
for (TatianaTree.Node<T> child : children)
child.parent = parent;
parent.children.addAll(children);
}
}
}
I presume that the number of letters to search for is not fixed. That is, today you may look for all last names beginning "John", but tomorrow you may look for "Joh" or "Johnb".
If that's the case, a hashmap won't work, because it contains no concept of a prefix. You hash an entire key value, and just because John
If you load the list once and then hold it and it doesn't change, I'd think the most practical solution would be to create an object for each name, then create an array of pointers to these objects, and sort the array by last name. Then use Arrays.binarySearch to find the first record with the given prefix and loop until you find the last.
If the list is very dynamic, my first thought would be to create a linked list, and create a set of "index pointers" to selected points in the linked list, like the first A, the first B, etc. Sequentially search from there.
If the list is both dynamic and too big for an "index tab" approach to work, then I think you're practical choices are to either store it on a database and use the database index retrieval abilities, or do a whole bunch of work to write a full in-memory indexing scheme. Frankly this sounds like way too much work to me. (Maybe there's some open source package to do this that you could just grab.) If you're really maintaining large amounts of data in memory rather than on a database, maybe you should ask yourself why. This is what databases are for.
This is a classic tree solution. you build a tree like this:
root
x / \y
1/\2
2/\3
When you start the search you walk on the tree.If you have x then you go by the x and all the names are the sub-tree under that node...etc. You can right simple recursion method for collecting the names.
The most simple implementation, I think, is by HashMap for nodes, that is each node is built by hash map which contains the matching node per char. This tree is also dynamic, it is easy to add insertion, deletion...etc.
If you're only going to search by last name as you write in question (or any other single field) you can use TreeMap<> as a data structure with effecient prefix lookup.
public class Phonebook {
NavigableMap<String, Collection<Record>> map = new TreeMap<>();
public void add(String name, String phone) {
map.computeIfAbsent(name, k -> new ArrayList<>()).add(new Record(name, phone));
}
public Collection<Record> lookup(String prefix) {
if (prefix.length() == 0) {
return Collections.emptyList(); // or all values if needed
}
String from = prefix;
char[] chars = prefix.toCharArray();
chars[chars.length - 1]++;
String to = new String(chars);
Collection<Record> result = new ArrayList<>();
map.subMap(from, to).values().forEach(result::addAll);
return result;
}
private static class Record {
private final String name;
private final String phone;
public Record(String name, String phone) {
this.name = name;
this.phone = phone;
}
@Override
public String toString() {
return "Record{" +
"name='" + name + '\'' +
", phone='" + phone + '\'' +
'}';
}
}
public static void main(String... args) {
// example
Phonebook book = new Phonebook();
book.add("john", "1");
book.add("john", "2");
book.add("johnny", "3");
book.add("joho", "4"); // joho will be computed as a value of 'to' parameter.
Collection<Record> records = book.lookup("john");
System.out.println("records = " + records);
}
}
精彩评论