开发者

How to exclude a symbol within [ ] with RegEx

I am using PHP preg_match_all, and this is what I can get so far....

[A-Za-z+\W]+\s[\d]

The only problem is that I need the \W to not be a ".

So I have tried:

[A-Za-z+[^\dA-Za-z"]\s?]+\s[\d]


[A-Za-z+]\s?+[^A-Za-z\d"]?\s[\d]

among other things, and it is just failing and I really can't figure out why.

EDIT:

Here is the entire RegEx;

([A-Z][a-z]+\s){1,5}\s?[^a-zA-Z\d\s:,.\'\"]\s?
[A-Za-z+\W]+\s[\d]{1,2}\s[A-Z][a-z]+\s[\d]{4}

I split it into two line, the second line begins with what I posted.

Patterns trying to match:

    India – Adulterated Tea Powder Seized 18 April 2011
    India – Importer of Haldiram’s Petha Sweet Cubes Issuing Voluntary Recall 26 April 2011
    India – Undeclared Gluten开发者_开发问答 Found in Sweets by Canadian Authorities 27 April 2011
    India – Adulteration Found in Edible Oils 28 April 2011
    India – Viral Disease Affects Chili Crop in Goa 28 April 2011
NOT ---->   Chili – India: Goa”. 8 April 2011
    Ivory Coast – Potential Cocoa Quality Decline despite Sufficient Surplus 11 April 2011
    Japan – Sanuki Kanzume Co. and Failure to Comply with FDA Standards 27 April 2011
    Madagascar – Toxic Sardines 14 April 2011
    Madagascar – Update: Toxic Sardines 26 April 2011


the pattern you are showing matches all letters and non word characters. The only thing not included in the pattern are numbers and you also want to not match the double quote.

[^\d\"_]+\s\d

Edit:

I could be wrong, but from the sample input, it appears you are just trying to match all lines that don't have a double quote. If so something like this is much easier and I've even grouped the date separate from the rest of the string. If you don't need to group the sting/date then just remove all the parenthesis.

<?php
error_reporting(E_ALL);
$str = "    India - Adulterated Tea Powder Seized 18 April 2011
    India - Importer of Haldiram’s Petha Sweet Cubes Issuing Voluntary Recall 26 April 2011
    India - Undeclared Gluten Found in Sweets by Canadian Authorities 27 April 2011
    India - Adulteration Found in Edible Oils 28 April 2011
    India - Viral Disease Affects Chili Crop in Goa 28 April 2011
    Chili - India: Goa\". 8 April 2011
    Ivory Coast - Potential Cocoa Quality Decline despite Sufficient Surplus 11 April 2011
    Japan - Sanuki Kanzume Co. and Failure to Comply with FDA Standards 27 April 2011
    Madagascar - Toxic Sardines 14 April 2011
    Madagascar - Update: Toxic Sardines 26 April 2011";
preg_match_all("/^([^\"]+?)(\d?\d\s[a-z]+\s\d{4})$/im", $str, $m);
echo '<pre>'.print_r($m, true).'</pre>';
?>


If you know that all lines with either be acceptable or contain a " (and therefore be unacceptable) then [^\"]+ should be fine.


Try this:

[A-Za-z+\W^\"]+\s[\d]

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜