Improving performance for string matching

2023-03-26 11:35 问答作者：

I am working for a startup that is building a iphone app. And i would like to ask a few questions to improve an algorithm we use for string matching.

We have a database that has a huge list of phone numbers along with th开发者_如何学Ce name of the user who owns the phone number. Lets say that the database looks like this

name phonenum

hari 1234

abc 3873

....

This database has large number of rows (around 1 million). When the user opens the app, the app gets the list of phone numbers from the person's phone contacts and matches it with the database. We return all the phone numbers that are present in the database. Right now, what we do is very very inefficient. We send the phone numbers from phone contacts in sets of 20. And we match it with the database. This will lead to a complexity of num of phone contacts * O(n).

I thought of some improvements like having the database rows sorted by phone numbers so that we can do binary search. In addition to that, we can have a hash table containing some 10,000 phone numbers in the cache memory and we can search against this cache memory initially. Only if there is a miss, we will access the database and search the database with complexity of O(log n) using binary search.

Also, there is the issue of sending phone numbers for matching. do i send them as such or send them as a hashed value ? will that matter in terms of improving performance?

Is there any other way of doing this thing?

I explained the whole scenario so that you can have a better understanding of my need

thanks

If you already have an SQL Server database, let it take care of that. Create an index on the phone number column (if you don't have it already). Send all the numbers in the contact list in one go (no need to split them by 20) and match them against the database. The SQL server probably uses much better indexing than anything you could come up with, so it's going to be pretty fast.

Alternatively, you can try to insert the numbers into a temporary table and query against that, but I have no idea whether that would be faster.

If you can represent phone numbers as numeric values instead of strings, then you can put an index on your database field that will make lookup operations very fast. Even if you have to represent them as strings, an index on the database field will be make looking up values fast enough to be a non-issue in the grand scheme of things.

Your biggest performance problem is going to be with all the round trips between the application and your database. That is a performance bottleneck with any web-enabled program. If you are unlikely to have a high rate of success (maybe 2% of the user's contacts are in your database), you'll probably be better off sending the whole list of phone numbers at once, since you'll just be getting data back for a few of them.

If the purpose is to update the user's contact data with the data found in your database, you could create a hash out of the appropriate fields and send that along with the phone number. Have the database keep a hash of those fields on its side and do a comparison. If the hash matches, then you don't have to send any data back because the local and remote versions are the same.

A successful caching strategy would require a good understanding of how the data will be used, so I can't provide much guidance based on the information given. For example, if 90% of the phones using your app will have all of the phone numbers matched in a small group of the numbers in the database, then by all means, put that small group into a Hashtable. But if users are likely to have any phone numbers that aren't in that small group, you're going to have to do a database round-trip. The key will be to construct a query that allows the database to return all of the data you need in one trip.

I'd split the phone number up into three parts

example 777.777.7777

Each section can be stored into and int and used as a hash tag.

This would mean that your data store becomes a series of hash tables.

Or you could force the whole number into an int and then use that as your hash key. But for fast results you'd need more buckets.

Cheers

继续阅读：algorithm caching database string string-matching

Improving performance for string matching

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？