Trying to remove trailing text
I having the following code. I want to extract the last text (hello64
) from it.
<span class="qnNum" id="qn">4</span><span>.</span> hello64 ?*
I used the code below but it removes all the integers
questionText = questionText.replace(/<span\b.*?>/ig, "");
questionText=questionText.replace(/<\/span>/ig, "");
questionText = questionText.replace(/\d+/g,"");
questionText = ques开发者_JAVA技巧tionText.replace("*","");
questionText = questionText.replace(". ",""); i want to remove the first integer, and need to keep the rest of the integers
It's the third line .replace(/\d+/g,"")
which is replacing the integers. If you want to keep the integers, then don't replace \d+
, because that matches one or more digits.
You could achieve most of that all on one line, by the way - there's no need to have multiple replaces there:
var questionText = questionText.replace(/((<span\b.*?>)|(<\/span>)|(\d+))/ig, "");
That would do the same as the first three lines of your code. (of course, you'd need to drop the |(\d+)
as per the first part of the answer if you didn't want to get rid of the digits.
[EDIT]
Re your comment that you want to replace the first integer but not the subsequent ones:
The regex string to do this would depend very heavily on what the possible input looks like. The problem is that you've given us a bit of random HTML code; we don't know from that whether you're expecting it to always be in this precise format (ie a couple of spans with contents, followed by a bit at the end to keep). I'll assume that this is the case.
In this case, a much simpler regex for the whole thing would be to replace eveything within <span
....</span>
with blank:
var questionText = questionText.replace(/(<span\b.*?>.*?<\/span>)/ig, "");
This will eliminate the whole of the <span>
tags plus their contents, but leave anything outside of them alone.
In the case of your example this would provide the desired effect, but as I say, it's hard to know if this will work for you in all cases without knowing more about your expected input.
In general it's considered difficult to parse arbitrary HTML code with regex. Regex is a contraction of "Regular Expressions", which is a way of saying that they are good at handling strings which have 'regular' syntax. Abitrary HTML is not a 'regular' syntax due to it's unlimited possible levels of nesting. What I'm trying to say here is that if you have anything more complex than the simple HTML snippets you've supplied, then you may be better off using a HTML parser to extract your data.
This will match the complete string and put the part after the last </span>
till the next word boundary \b
into the capturing group 1. You just need to replace then with the group 1, i.e. $1
.
searched_string = string.replace(/^.*<\/span>\s*([A-Za-z0-9]+)\b.*$/, "$1");
The captured word can consist of [A-Za-z0-9]
. If you want to have anything else there just add it into that group.
精彩评论