Complex regex question, data may or may not be in brackets
I need to extract data from a source that presents it in one of two ways. The data could be formatted like this:
Francis (Lab) 18,077 (60.05%); Waller (LD) 4,140 (13.75%); Evans (PC) 3,545 (11.78%); Rees-Mogg (C) 3,064 (10.18%); Wright (Veritas) 768 (2.55%); La Vey (Green) 510 (1.69%)
Or like this:
Lab 8,994 (33.00%); C 7,924 (29.07%); LD 5,197 (19.07%); PC 3,818 (14.01%); Others 517 (1.90%); Green 512 (1.88%); UKIP 296 (1.09%)
The data I need to extract is the percentage and the party (these are election results), which is either in brackets (first e开发者_JAVA技巧xample) or is the only non-numeric text.
So far I have this:
preg_match('/(.*)\(([^)]*)%\)/', $value, $match);
Which is giving me the following matches (for first example):
Array
(
[0] => Francis (Lab) 18,077 (60.05%)
[1] => Francis (Lab) 18,077
[2] => 60.05
)
So I have the percentage, but I also need the party label, which may or may not be in brackets and may or may not be the only text. Can anyone help?
Do party symbols ever have whitespace in them? If not, this should do the trick:
'/\(?([A-Za-z]+)\)?\s*[\d,]+\s*\(([\d.]+%)\)/'
The regex is anchored by the raw number and the percentage; the party is just the last non-whitespace sequence preceding them, and may or may not be enclosed in brackets.
精彩评论