开发者

find any of the substrings in a string: if found mark that string ok

I have string say "Dog is a kind of animal";

Now if I have to find a string which contains any of these words instead of Dog like Cat, Horse, Tiger, Lion then I have to give status of string OK.

I am fully aware of string.find function which matches a single sub string to a string. But in my case I have to开发者_JS百科 check the string with 30 possibilities like cat, horse, lion .... 30 animals .

I have no idea how to proceed with that.

string line2 = "horse is a kind of animal" ;
const char* array[] = { "cat", "dog", "horse" };    
for (unsigned int i = 0; i<= sizeof(array); i++)
{  
  size_t loc = line2.find( array[i], 0);  
  if( loc != string::npos)  
  {  
   std::cout <<"true"<<std::endl;   
   break;  
  }// end if

  else  
 {
   cout <<"not found"<< std::endl;
 }


Consider using one of the many available regular expression (eg, google re2) libraries to search for the union of your search terms - eg, (cat|dog|horse|...). This ought to be faster than simply doing a search for each of the substrings, as it need only scan the string once.


Here is a very straight-up way to do it (I'll add alternatives in a monent):

#include <string>
#include <algorithm>
#include <iostream>
#include <vector>
using namespace std;

int main()
{
    string victim = "horse is a kind of animal" ;
    vector<string> targets;
    targets.push_back("cat");
    targets.push_back("dog");
    targets.push_back("horse");

    string found_target; // set to the target we found, if we found any
    for( vector<string>::const_iterator it = targets.begin(); found_target.empty() && (it != targets.end()); ++it )
    {
        if( victim.find(*it) != string::npos )
            found_target = *it;
    }
    if( !found_target.empty() )
        cout << "Found '" << found_target << "'\n";
    else
        cout << "Not found\n";
}

EDIT

If you have the benefit of a C++0x compiler, you can use a lambda to make the code a little cleaner:

#include <string>
#include <algorithm>
#include <iostream>
#include <vector>
using namespace std;

int main()
{
    string victim = "horse is a kind of animal" ;
    vector<string> targets;
    targets.push_back("cat");
    targets.push_back("dog");
    targets.push_back("horse");

    vector<string>::const_iterator it_found = find_if(targets.begin(), targets.end(), [&victim](string s) -> bool {
        return( victim.find(s) != string::npos );
    });
    if( it_found != targets.end() )
        cout << "Found '" << *it_found << "'\n";
    else
        cout << "Not found\n";
}


You can use TR1 Regular Expressions. This simple example uses search with a boolean result. There are other functions that let you iterate through multiple matches or do search-and-replace.

#include <iostream>
#include <regex>
#include <string>

int main()
{
    std::string line("horse is a kind of animal");
    std::regex rx("cat|dog|horse");

    if (std::regex_search(line.begin(), line.end(), rx))
        std::cout << "true\n";
    else
        std::cout << "not found\n";
}


There are a lot of factors here, for example:

  • do you care about white space? e.g. can there be multiple spaces between "dog" and "is"?
  • do you care about case?
  • what level of performance do you need?

The most flexible approach is to use regular expressions. Boost has an implementation, as do many popular Operating Systems (e.g. Linux man regexp et al). Checking for a match against something like "^([A-Z]+)\s+is\s+a\s+kind\s+of\s+animal\s$", where the parenthesised subexpression (the type of animal) can be extracted by a regexp library and then searched for in an array. You may want to use a string insensitive comparison. This assumes that the list of supported animals is read from some external source at run-time. As bdonlan suggests - if it's known in advance, you can hard-code it in the regular expression (dog|cat|...).

You can pre-sort the array and use a binary search: C++'s STL already has algorithms for sorting and searching. That will be a bit faster than populating a std::set with the list of animals, but then you may not care about the speed difference.

Another approach is to scan with C++ streams:

std::string what, is, a, kind, of, animal;
char unwanted;
std::istringstream input(" Dog is a kind of animal");

if ((input >> what >> is >> a >> kind >> of >> animal) &&
    !(input >> unwanted) &&
    is == "is" && a == "a" && kind == "kind" && of == "of" && animal == "animal")
{
    // match!
}

You can do something similar with sscanf, which requires care to with the pointers and not to read too many characters, but is also more efficient:

char what[21];
if (sscanf(candidate, "%.20[A-Za-z] is a kind of animal %c", what, &unwanted) == 1)
    // match...


Here's my response, it ignores case for bonus points!

Helper to get the size of an array:

template <typename T, std::size_t N>
inline std::size_t sizeof_array(T(&)[N]) {
   return N;
}

Code to test for valid string:

std::string text = "Dog is a kind of animal";
std::string animals[] = {"dog","cat","lion","giraffe"};    
std::transform(text.begin(), text.end(), text.begin(), ::tolower);

bool valid = false;
for(size_t i = 0; !valid && i < sizeof_array(animals); ++i) {
    valid = (text.find(animals[i]) != std::string::npos);
}


If you can use c++ STL, create a set with your keywords as the elements.

std::set myset; myset.insert("Dog"); myset.insert("Cat"); ...

then extract the candidate token from the line and check if it exists in the set:

myset.count(token) // 1 if match, 0 if no match


Use std::any_of. Explained in comments of below example.

//LOAD ALL THE REQUIRED ANIMALS.
std::vector<std::string> animals = { "Cat","Dog","Horse","Donkey" }; 

//STRING TO BE SEARCHED.
std::string toBeSearched{ "Dog is a kind of animal" };

//USE any_of. Make a note of "&" in the lambda capture. The "toBeSearched" variable is accessible inside lambda.
bool found = std::any_of(animals.begin(), animals.end(), [&](auto item)
    {return (toBeSearched.find(item) != std::string::npos); });

//HANDLE BUSINESS
if (found)
{
    //Business
}

The std::any_of, exits the loop immediately after condition is true.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜