Trying to remove trailing text

2023-02-26 16:29 问答作者：

I having the following code. I want to extract the last text (hello64) from it.

<span class="qnNum" id="qn">4</span><span>.</span> hello64 ?*

I used the code below but it removes all the integers

questionText = questionText.replace(/<span\b.*?>/ig, "");
questionText=questionText.replace(/<\/span>/ig, "");
questionText = questionText.replace(/\d+/g,"");

questionText = ques开发者_JAVA技巧tionText.replace("*","");
questionText = questionText.replace(". ","");  i want to remove the first integer, and need to keep the rest of the integers

It's the third line .replace(/\d+/g,"") which is replacing the integers. If you want to keep the integers, then don't replace \d+, because that matches one or more digits.

You could achieve most of that all on one line, by the way - there's no need to have multiple replaces there:

var questionText = questionText.replace(/((<span\b.*?>)|(<\/span>)|(\d+))/ig, "");

That would do the same as the first three lines of your code. (of course, you'd need to drop the |(\d+) as per the first part of the answer if you didn't want to get rid of the digits.

[EDIT]

Re your comment that you want to replace the first integer but not the subsequent ones:

The regex string to do this would depend very heavily on what the possible input looks like. The problem is that you've given us a bit of random HTML code; we don't know from that whether you're expecting it to always be in this precise format (ie a couple of spans with contents, followed by a bit at the end to keep). I'll assume that this is the case.

In this case, a much simpler regex for the whole thing would be to replace eveything within <span....</span> with blank:

var questionText = questionText.replace(/(<span\b.*?>.*?<\/span>)/ig, "");

This will eliminate the whole of the <span> tags plus their contents, but leave anything outside of them alone.

In the case of your example this would provide the desired effect, but as I say, it's hard to know if this will work for you in all cases without knowing more about your expected input.

In general it's considered difficult to parse arbitrary HTML code with regex. Regex is a contraction of "Regular Expressions", which is a way of saying that they are good at handling strings which have 'regular' syntax. Abitrary HTML is not a 'regular' syntax due to it's unlimited possible levels of nesting. What I'm trying to say here is that if you have anything more complex than the simple HTML snippets you've supplied, then you may be better off using a HTML parser to extract your data.

This will match the complete string and put the part after the last </span> till the next word boundary \b into the capturing group 1. You just need to replace then with the group 1, i.e. $1.

searched_string = string.replace(/^.*<\/span>\s*([A-Za-z0-9]+)\b.*$/, "$1");

The captured word can consist of [A-Za-z0-9]. If you want to have anything else there just add it into that group.

继续阅读：javascript regex

Trying to remove trailing text

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？