开发者

Ruby Regex: Return just the match

When I do

puts /<title>(.*?)<\/title>/.match(html)

I get

<h2>foobar</h2>开发者_开发百科

But I want just

foobar

What's the most elegant method for doing so?


The most elegant way would be to parse HTML with an HTML parser:

require 'nokogiri'

html  = '<title><h2>Pancakes</h2></title>'
doc   = Nokogiri::HTML(html)
title = doc.at('title').text
# title is now 'Pancakes'

If you try to do this with a regular expression, you will probably fail. For example, if you have an <h2> in your <title> what's to prevent you from having something like this:

<title><strong>Where</strong> is <span>pancakes</span> <em>house?</em></title>

Trying to handle something like that with a single regex is going to be ugly but doc.at('title').text handles that as easily as it handles <title>Pancakes</title> or <title><h2>Pancakes</h2></title>.

Regular expressions are great tools but they shouldn't be the only tool in your toolbox.


Something of this style will return just the contents of the match.

html[/<title>(.*?)<\/title>/,1]

Maybe you need to tell us more, like what html might contain, but right now, you are capturing the contents of the title block, irrespective of the internal tags. I think that is the way you should do it, rather than assuming that there is an internal tag you want to handle, especially because what would happen if you had two internal tags? This is why everyone is telling you to use an html parser, which you really should do.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜