Regex to match dates
I have a HTML input that have three patterns like this:
Pattern one
</div>
<div class="myclass">
Pattern two
</div>
<p class="ProductMeta">29-06-2011</p>
<div class="myclass">
Pattern trhee
</div>
<p class="ProductMeta">29/06/2011</p>
<div class="myclass">
I'm trying to create a RegEx that can catch the tree dates: empty, with dashes and with slashes. I think i need some kind of nested groups, but 开发者_StackOverflowcan't make it work.
This is the RegEx I've created:
Regex r = new Regex(
@"src=""</div>\s+<p class=""ProductMeta"">([\d\/\-]+)"
, RegexOptions.Compiled | RegexOptions.IgnoreCase);
And tried the following to make the group optional:
Regex r = new Regex(
@"src=""</div>[\s+<p class=""ProductMeta"">([\d\/\-]+)]?"
, RegexOptions.Compiled | RegexOptions.IgnoreCase);
Can anyone help me?
The console test does the following to print on screen:
foreach (Match m in mcl)
{
Console.WriteLine(m.Groups[1].Value.Replace("-","/") + " - " + m.Groups[5].Value);
}
Console.Read();
Thanks.
You can't make the []
's an optional group. Only ()
can be optional.
var test1 =
@"</div>
<div class=""myclass"">";
var test2 =
@"</div>
<p class=""ProductMeta"">29-06-2011</p>
<div class=""myclass"">";
var test3 =
@"</div>
<p class=""ProductMeta"">29/06/2011</p>
<div class=""myclass"">";
string re = @"</div>\s+(<p class=""ProductMeta"">(\d\d([-/])\d\d\3\d\d\d\d))?";
Regex regExpr = new Regex(re, RegexOptions.Multiline);
Console.WriteLine(regExpr.Match(test1).Groups[2].Value); //== ""
Console.WriteLine(regExpr.Match(test2).Groups[2].Value); //== "29-06-2011"
Console.WriteLine(regExpr.Match(test3).Groups[2].Value); //== "29/06/2011"
You only have 1 group in the regular expression you created (that first one), but you are taking the 6th.
Regex r = new Regex(
@"src=""</div>\s+<p class=""ProductMeta"">([\d\/\-]+)"
, RegexOptions.Compiled | RegexOptions.IgnoreCase);
foreach (Match m in mcl)
Console.WriteLine(m.Groups[1].Value.Replace("-","/"));
The second regex is completely wrong because you put the stuff you're trying to match in a character class (which means it will match any single character from your list.) So, [\s+p ]
would match a whitespace character, a space , or a plus
+
, or a p
.
Id split this into two regexe's
First match the p class:
<p\s(?:[^\s>]*?\s)*?(class="ProductMeta")>.*
Then match the date (US & UK):
/^((0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])|(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01]))[- /.](19|20)?\d\d$/gm
Im not saying its perfect, but it works :)
You can use this Regex: </div>\s*<p class="ProductMeta">(\d{2}[-/]\d{2}[-/]\d{4})
精彩评论