Can't use javascript regex to get everything between html/xml tags
So I receive some xml in plaintext (and no I can't use DOM or JSON because apparently I am not allowed to), I want to strip a开发者_如何学Goll elements encased in a certain element and put them into an array, where I can strip out the text in the individual segments. Now I am used to using POSIX regex and I will never actually understand the point behind PCRE regex, nor do I get the syntax.
Now here is the code I am using:
var strResponse = objResponse.text;
var strRegex = new RegExp("<item>(.*?)<\/item>","i");
var arrMatches = "";
var match;
while (match = strRegex.exec(strResponse)) {
arrMatches[] = match[1];
}
I have no idea why it won't find any matches with this code, can someone please help me on this and perhaps elaborate on what exactly it is I am continuously doing wrong with the PCRE syntax?
If those tags are in different rows the .
will not match the newline characters and therefor your expression will not match. This is just a guess, I don't know your source.
You can try
var strRegex = new RegExp("<item>([\\s\\S]*?)<\\/item>","i");
[\\s\\S]
is a character class. containing all whitespace and all non whitespace characters. linebreaks are covered by the whitespace characters.
The best way to complete this task is using the following, to parse it as proper HTML and navigate it with the DOM parser: Javascript function to parse HTML string into DOM? Regex has it with being very faulty and is in general not very good for parsing irregular text like HTML structure.
精彩评论