Parsing XML using regex and grabbing the value inbetween tags
I have a regular expression that I use to grab data between two sets of id's for example
<CLASSCOD>70</CLASSCOD>
The regular expression I use is (?<=<CLASSCOD>)(?:[^<]|<(?!/CLASSCOD))*
which works in most case but when i have a single value like this <CLASSCOD>N</CLASSCOD>
it says there are no matches.
The whole data string looks like this
<DATE>0601</DATE>
<YEAR>11</YEAR>
<AGENCY>Department of the Interior</AGENCY>
<OFFICE>Bureau of Indian Affairs</OFFICE>
<LOCATION>BIA - DAPM</LOCATION>
<ZIP>85004</ZIP>
<CLASSCOD>N</CLASSCOD>
<OFFADD>Contracting Office - Western Region 2600 N. Central Avenue, 4th Floor Phoenix AZ 85004</OFFADD>
<SUBJECT>Boiler Replacement</SUBJECT>
<SOLNBR>A11PS00463</SOLNBR>
<RESPDATE>061711</RESPDATE>
<ARCHDATE>05312012</ARCHDATE>
<CONT开发者_如何学编程ACT>Geraldine M. Williams Purchasing Agent 6023794087 geraldine.williams@bia.gov;<a href="mailto:EC_helpdesk@NBC.GOV">Point of Contact above, or if none listed, contact the IDEAS EC HELP DESK for assistance</a>
</CONTACT>
<LINK><URL>https://www.fbo.gov/spg/DOI/BIA/RestonVA/A11PS00463/listing.html<LINKDESC>Link To Document</LINK>
<EMAIL></EMAIL>
<EMAIL>
EC_helpdesk@NBC.GOV
<EMAILDESC>
Point of Contact above, or if none listed, contact the IDEAS EC HELP DESK for assistance
</EMAILDESC>
</EMAIL>
<SETASIDE>Total Small Business</SETASIDE>
<POPCOUNTRY>USA</POPCOUNTRY>
<POPZIP>85634</POPZIP>
<POPADDRESS>BIE Tohono O'odham High School, Sells, AZ</POPADDRESS>
Any Suggestions as to the reason?
Thanks
Something simpler should work:
<CLASSCOD>(.+?)</CLASSCOD>
Example:
Match match = Regex.Match(input, @"<CLASSCOD>(.+?)</CLASSCOD>");
if (match.Success) {
string value = match.Groups[1].Value;
Console.WriteLine(value);
}
If you would like to extract the value inside the brackets you may use the following RegEx:
<([^>]+)>([^<]*)</\1>
For this scenario there is no need to use the lookahead and lookbehind operators.
精彩评论