Full-Text Substring Searching in iOS

2023-02-18 05:02 问答作者：

I need my iPhone开发者_开发百科 / iPad app to be able to quickly search through about 10,000 records (about a paragraph worth of text, each), for any substring contained within the record. So if the record contains the word "Flame", querying for "lame" should match.

I'm currently using SQLite, but "LIKE %term%" searches are too slow for this many records. Enabling Full-Text Search doesn't seem like it will fully meet my needs, since SQLite only supports prefix wildcards (e.g. "Flam*", not "*lame").

I've experimented with using a giant blob of text (~350K), and doing [NSString rangeOfString:...], which I think uses a Boyer-Moore algorithm. This is faster than "LIKE %term%" searches, but still not the kind of speed I'm hoping for.

Any suggestions for approaches, or libraries that would achieve this kind of scalable substring search, and which would work on an iPhone?

Here are a number of different options. I am not aware of the bechmarks for each, so you will have to do some testing.

First is the FTS3 extension to SQLite. This should give you fast, indexed full text search: http://regularrateandrhythm.com/regular-rate-rhythm-blog/sqlite3-fts-in-IOS4.html

Then, how about regular expressions which were introduced in iOS 4:
http://developer.apple.com/library/ios/#documentation/Foundation/Reference/NSRegularExpression_Class/Reference/Reference.html

For pre-iOS 4, you can use RegexKitLite:
http://regexkit.sourceforge.net/RegexKitLite/index.html

If you decide to use regular expressions, then take a look at this entry on how to optimize those:
How to speed up iPhone regular expressions with NSRegularExpression?

Perhaps consider combining your second approach with the asynchronous approach. Divide your large block of text into 5,10,whatever size and search them separately with the same number of threads. Then combine results by using a coordinate system that knows how to position the matches correctly (e.g. thread 5 searched region 5 and found a match at position 337 which correlates to document x, position y). You will find that there is a limit where adding more threads does no good so that would be the first thing to figure out.

If you can't tokenize the text (split it into words) you can't index it. That's why LIKE is a sequential search. Unless your substring can be constrained somehow (always drop the first letter or a fixed length for the substring, for instance) your text can't be stored as a list of all possible tokens and those tokens can't be indexed. The key (pun intended) is to find an algorithm that produces a small enough list of tokens that the cost of indexing them is less than the cost of a linear search.

继续阅读：full-text-search search substring

Full-Text Substring Searching in iOS

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？