开发者

Regex to match dates

I have a HTML input that have three patterns like this:

Pattern one

</div>

<div class="myclass">

Pattern two

</div>

    <p class="ProductMeta">29-06-2011</p>

<div class="myclass">

Pattern trhee

</div>

    <p class="ProductMeta">29/06/2011</p>

<div class="myclass">

I'm trying to create a RegEx that can catch the tree dates: empty, with dashes and with slashes. I think i need some kind of nested groups, but 开发者_StackOverflowcan't make it work.

This is the RegEx I've created:

Regex r = new Regex(
                @"src=""</div>\s+<p class=""ProductMeta"">([\d\/\-]+)"
                , RegexOptions.Compiled | RegexOptions.IgnoreCase);

And tried the following to make the group optional:

Regex r = new Regex(
                    @"src=""</div>[\s+<p class=""ProductMeta"">([\d\/\-]+)]?"
                    , RegexOptions.Compiled | RegexOptions.IgnoreCase);

Can anyone help me?

The console test does the following to print on screen:

foreach (Match m in mcl)
            {
                Console.WriteLine(m.Groups[1].Value.Replace("-","/") + " - " + m.Groups[5].Value);
            }

            Console.Read();

Thanks.


You can't make the []'s an optional group. Only () can be optional.

var test1 = 
@"</div>
<div class=""myclass"">";

var test2 =
@"</div>
    <p class=""ProductMeta"">29-06-2011</p>
<div class=""myclass"">";

var test3 = 
@"</div>
    <p class=""ProductMeta"">29/06/2011</p>
<div class=""myclass"">";

string re = @"</div>\s+(<p class=""ProductMeta"">(\d\d([-/])\d\d\3\d\d\d\d))?";
Regex regExpr = new Regex(re, RegexOptions.Multiline);

Console.WriteLine(regExpr.Match(test1).Groups[2].Value); //== ""
Console.WriteLine(regExpr.Match(test2).Groups[2].Value); //== "29-06-2011" 
Console.WriteLine(regExpr.Match(test3).Groups[2].Value); //== "29/06/2011"


You only have 1 group in the regular expression you created (that first one), but you are taking the 6th.

  Regex r = new Regex(
            @"src=""</div>\s+<p class=""ProductMeta"">([\d\/\-]+)"
            , RegexOptions.Compiled | RegexOptions.IgnoreCase);
  foreach (Match m in mcl)
     Console.WriteLine(m.Groups[1].Value.Replace("-","/"));

The second regex is completely wrong because you put the stuff you're trying to match in a character class (which means it will match any single character from your list.) So, [\s+p ] would match a whitespace character, a space , or a plus +, or a p.


Id split this into two regexe's

First match the p class:

<p\s(?:[^\s>]*?\s)*?(class="ProductMeta")>.*

Then match the date (US & UK):

/^((0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])|(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01]))[- /.](19|20)?\d\d$/gm

Im not saying its perfect, but it works :)


You can use this Regex: </div>\s*<p class="ProductMeta">(\d{2}[-/]\d{2}[-/]\d{4})

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜