Suffix Range c++
I am trying to build a Suffix range that is
if I have strings "catalog" "catalyst" "ban" "bany"
then suffix tree will be like
.
/ \
c b
/ \
a a
/ \
t n
/ \ / \
a a $ y
/ \ / \
l l $ $
/ \
开发者_JAVA百科 o y
/ \
g s
/ \ \
$ $ t
/\
$ $
I want to find Suffix range of each string now .. that if I take string "Cat" then it should give me a range enclosing all its suffixes to which "cat" is a prefix. I need to use sentinels to separate each string.. may be a "$"
Can any one suggest me a best way to find out this using c++ . Any references will be helpfull. thank you
Much simpler answer than my first. You have a std::set of strings:
typedef std::set<std::string>::iterator iter_type;
std::set<std::string> data;
and a function named find() which returns a pair of iterators. The first iterator points at the beginning of the strings that match the prefix, and the last iterator is one past the last string that matches the prefix. If you have 10000 strings, this needs to only check about 26 of them.
std::pair<iter_type, iter_type> find(std::string substr) {
std::pair<iter_type, iter_type> r;
r.first = data.lower_bound(substr);
substr[substr.size()-1]++; //I'm assuming substr is at least one character
r.second = data.upper_bound(substr);
return r;
}
Then, after the data has been loaded, you merely call the find(...) function, and it returns a pair of iterators pointing at the strings you want. You can use these as inputs to any standard algorithm, or do whatever.
int main() {
data.insert("catalog");
data.insert("catalyst");
data.insert("ban");
data.insert("bany");
//find the region of strings beginning with "cat"
std::pair<iter_type, iter_type> range = find("cat");
//display them all
for(iter_type i=range.first; i!=range.second; ++i)
std::cout << *i << '\n';
}
Solution 1 : Space efficient Use Trie data structure (one-char is one node, one node can point to 26 different nodes) Find the last-node for given prefix. Print prefix+'path to all terminal nodes'
Solution 2 : Time efficient say you are interested in only first 3 prefix chars. Create a 3d array
vector<string> arr[27][27][27]
Insert . if you want to insert
word : ABCD arr[A][B][C].push_back("D")
word : BBBX arr[B][B][B].push_back("X")
Print : vector & a = arr[char1][char2][char3] for( string s in a) char1-char2-char3+ s
Here is, I guess, the most concise answer. :)
set<string> s;
string word = "ABC"
//Inserts.
// e.g. s.insert("ABCD");
for(set<string>::iterator it=s.begin();it!=s.end();++it)
if(!(*it).compare(0,word.size(),word))
cout<<*it<<endl;
Tested code! :P
I posted an algorithm to solve a remarkably similar problem at Is there a suitable data structure for solving this question?. First, we create a suffix tree of nodes similar to
class node { //create a prefix node type
node & operator=(const node & b); //UNDEFINED, NO COPY
node & operator=(const node && b); //UNDEFINED, NO COPY
node * next[27]; // pointers to nodes of the next letter (27th letter is $)
public:
node();
~node();
void add(char* mystring);
void find(char* mystring,
std::vector<std::pair<int, std::string>>& out,
std::string sofar="");
}root;
And fill it. Then, to find all the substrings of "cata", we iterate through the tree according to the letters in "cata" (root[3]->[0]->['t'-'a'?]->[0]), and keep track of the string sofar
. When we reach the end of mystring
, we recursively try going down each child, instead of just the ones that match the substring, and wherever we find 'end' (letter 27), we push sofar
onto out
. Then we simply return, and out
holds all the full strings beginning with "cata".
精彩评论