Select a part of the matching string using regular expressions

2023-03-29 19:44 问答作者：

The data which I need to extract from a web page is delimited by specific comments: . I use this expression: .+? ad it works fine.

But maybe there is a way to to get the text without the htm开发者_如何学Gol comments at the beginning and at the end of the string?

I also need this when looking for img tags in html code but the result shuld contain only the link to the picture.

Is this possible to include in a regular expression?

If you wrap the part of the regex you wish to capture in parentheses ( ) you can retrieve the captured string with $1, $2, etc.

In general though, parsing HTML with regular expressions is a very bad idea. See this answer: RegEx match open tags except XHTML self-contained tags

If you want to exclude this stuff, put brackets around the part you want and use the capturing group or use lookaround assertions.

Solution 1:

<!--data-->(.+?)<!--data-->

Your result is in group 1. How you get the content of this capturing group depends on your language. You should really add this information to your question.

Solution 2:

(?<=<!--data-->).+?(?=<!--data-->)

Matched only the stuff defined by .*?. Will work only when your language support look behind and look ahead assertions.

Solution 3:

Use a Html parser. This is probably in your case the best solution. Because Html supports nested tags and its not possible to reliably match those with regular expressions.

If you tell us the language you use, you can maybe get a good answer using a parser available to this language.

继续阅读：regex

Select a part of the matching string using regular expressions

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？