How do you tell if two wildcards overlap?

2022-12-29 21:18 问答作者：

Given two strings with * wildcards, I would like to know if a string could be created that would match both.

For example, these two are a simple case of overlap:

Hello*World
Hel*

*.csv
reports*.csv
reportsdump.csv

Is there an algorithm published for doing this? Or perhaps a utility function in Windows or a library I might be able to call or copy?

Since every glob can be written as a regular expression and the intersection of two regular expressions can be found (unless they aren't really regular, but they would be in this case), you can find the intersection of two globs by transforming them into regular expressions and then finding the intersection of those. So you can find out whether two globs intersect by finding the intersection of the regular expressions and checking whether it's empty.

However since globs are more limited than regular expression, there is a much easier way:

Let's call the two globs g1 and g2. They intersect iff

Both g1 and g2 are empty or only contain wildcards.
Neither g1 nor g2 are empty and one of the following conditions is true (let c1 be the first character of g1 and t1 the string containing the remaining characters - same for g2 with c2 and t2):
1. c1 and c2 are equal and t1 and t2 intersect
2. c1 and/or c2 is a wildcard and t1 intersects with g2
3. c1 and/or c2 is a wildcard and g1 intersects with t2

An example implementation in haskell:

intersect g1          []          = all (== '*') g1
intersect []          g2          = all (== '*') g2
intersect g1@('*':t1) g2@(c2:t2)  = intersect g1 t2 || intersect t1 g2
intersect g1@(c1:t1)  g2@('*':t2) = intersect t1 g2 || intersect g1 t2
intersect    (c1:t1)     (c2:t2)  = c1 == c2        && intersect t1 t2

This algorithm isn't particular efficient if the globs contain a lot of wildcards, but it's very easy to implement and since you're likely planning to use this with filenames, I doubt you'll have globs longer than 1000 chars.

For what it's worth, here's one implementation of the algorithm from sepp2k's answer in C# (I used explicit return true; and return false; calls, along with comments, for algorithm readability):

public static bool WildcardIntersect(string w1, string w2)
{
    // if both are empty or contain wildcards
    if ((string.IsNullOrEmpty(w1) || w1 == "*")
        && (string.IsNullOrEmpty(w2) || w2 == "*"))
        return true;

    // if either string is empty, return false
    // we can do this because we know the other string MUST be non-empty and non-wildcard
    if (string.IsNullOrEmpty(w1) || string.IsNullOrEmpty(w2))
        return false;

    char c1 = w1[0], // first character of wildcard string 1
         c2 = w2[0]; // first character of wildcard string 2
    string remain1 = w1.Substring(1), // remaining of wildcard string 1
           remain2 = w2.Substring(1); // remaining of wildcard string 2

    // if first letters match and remaining intersect
    if ((c1 == c2 && WildcardIntersect(remain1, remain2))
        // if either is a wildcard and either remaining intersects with the other whole
        || ((c1 == '*' || c2 == '*') && (WildcardIntersect(w1, remain2) || WildcardIntersect(remain1, w2))))
        return true;

    // else, no match, return false
    return false;
}

You can solve this in time linear in the sum of the pattern lengths:

If both strings start or end with non-wildcards, check that they match until one pattern hits a wildcard (otherwise they don't match). This reduces the problem to the case where at least one pattern starts with a wildcard and at least one pattern ends with a wildcard. If both patterns have wildcards (somewhere), then they have to match:

if p1 starts with a wildcard and p2 ends with a wildcard, use the p1 wildcard to eat up all of p2 up to its last wildcard, then use the p2 wildcard to eat up all of p1
if p1 starts and ends with a wildcard, then use its starting wildcard to eat up p2 up to its first wildcard, then use the p2 wildcard to eat up p1 to its last wildcard, then use the last p1 wildcard to eat up the rest of p2

Otherwise, one string (p1) has no wildcards, and the other string (p2) has strings s1,s2,...punctuated with wildcards. So just search for the first occurrence of s1 in p1, then for the first subsequent occurrence of s2 (starting from the end of the match in p1), etc. If you find all of the strings, then the patterns match, otherwise they don't.

As I understand you try to determine if a regex is orthogonal to another regex? If so, this is very not trivial problem.

Here is more about Theory.

Here is solution: Java library.

Usage:

/**
 * @return true if the two regexes will never both match a given string
 */
public boolean isRegexOrthogonal( String regex1, String regex2 ) {
   Automaton automaton1 = new RegExp(regex1).toAutomaton();
   Automaton automaton2 = new RegExp(regex2).toAutomaton();
   return automaton1.intersection(automaton2).isEmpty();
}

Here is a c++ implementation of the algorithm suggested by sepp2k with a slight modifications:

bool intersect(const std::string& pattern1, const std::string& pattern2) {
    if(pattern1.empty() && pattern2.empty()) return true;
    if("*" == pattern1 || "*" == pattern2) return true;

    if(pattern2.empty() && '*' == pattern1[0]) return true;
    if(pattern1.empty() && '*' == pattern2[0]) return true;

    if(pattern1.empty() || pattern2.empty()) return false;

    char c1 = pattern1[0];
    char c2 = pattern2[0];
    string subPattern1 = pattern1.substr(1);
    string subPattern2 = pattern2.substr(1);


    if('*' == c1 && '*' == c2)
        return intersect(pattern1, subPattern2) && intersect(subPattern1, pattern2);

    if('*' == c1 && intersect(pattern1, subPattern2)
       || '*' == c2 && intersect(subPattern1, pattern2)
       || c1 == c2 && intersect(subPattern1, subPattern2)) {
        return true;
    }

    return false;
}

继续阅读：algorithm language-agnostic string string-comparison

How do you tell if two wildcards overlap?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？