开发者

Regex : Define a regex to catch an HTML group

I have the following html :

    <div class="headNormal">
    <h1><a href="/questions/76/specify-a-mirror-when-configuring-a-gdi-e开发者_Go百科nvironment">
    Specify a Mirror when configuring a GDI environment</a></h1></div>

And i'd like to catch the "Specify a Mirror when configuring a GDI environment" thing... but i'm not sure of the regex i should use for this

So far i have : <div class="headNormal">(.*)</div> but it doesnt give me anything.

Any help?


Based on the exact snippet you've provided, you'd want something like this:

<a .+?>(.*?)</a>

However, you're opening yourself up to a whole world of hurt if you've got to parse large HTML documents and extract the text from anchors (case-in-point is Konrad Rudolph's comment on this question). You'd be much better off with a parser.

You're not specific about the language you're using, but if it's .NET have a look at the HTML Agility Pack.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜