开发者

regex: remove all but?

I have html that looks like

<tr>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?color=Yellow">Yellow</a>&nbsp;</td>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?hex=FFFF00">#FFFF00</a></td>
<td bgcolor="#FFFF00">&nbsp;</td>
<td align="left"><a href="/tags/ref_colorpicker.asp?colorhex=FFFF00">Shades</a></td>
<td align="left"><a href="/tags/ref_colormixer.asp?colorbottom=FFFF00&colortop=FFFFFF">Mix</a></td>
</tr>


<tr>
<td align="left"><a target="_b开发者_Go百科lank" href="/tags/ref_color_tryit.asp?color=YellowGreen">YellowGreen</a>&nbsp;</td>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?hex=9ACD32">#9ACD32</a></td>
<td bgcolor="#9ACD32">&nbsp;</td>
<td align="left"><a href="/tags/ref_colorpicker.asp?colorhex=9ACD32">Shades</a></td>
<td align="left"><a href="/tags/ref_colormixer.asp?colorbottom=9ACD32&colortop=FFFFFF">Mix</a></td>
</tr>

What I am wanting to do is

filter the html so I only end up with

<td bgcolor="#XXXXXX">&nbsp;</td>

Then Filter that so I end up with a whole pile of rows of

XXXXXX
XXXXXX

How would I do that?


Hi you can use following regex.

\<td bgcolor\=\"([^\"]+\)">\&nbsp\;\<\/td\>

Use group option to capture "XXXXXX"


First regex to match the right tags:

\<td bgcolor="#[0-9A-Fa-f]{6}">&nbsp;\</td\>

Then, you can filter that data again with (or use a group option, depends on what language as to which is more convenient):

[0-9A-Fa-f]{6}

That is, if you want to use regex (don't shoot me, the question is what regular expression can I use for this)


if you must use regex, here is a demo using Ruby's irb:

>> %Q{
<tr>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?color=Yellow">Yellow</a>&nbsp;</td>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?hex=FFFF00">#FFFF00</a></td>
<td bgcolor="#FFFF00">&nbsp;</td>
<td align="left"><a href="/tags/ref_colorpicker.asp?colorhex=FFFF00">Shades</a></td>
<td align="left"><a href="/tags/ref_colormixer.asp?colorbottom=FFFF00&colortop=FFFFFF">Mix</a></td>
</tr>


<tr>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?color=YellowGreen">YellowGreen</a>&nbsp;</td>
<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?hex=9ACD32">#9ACD32</a></td>
<td bgcolor="#9ACD32">&nbsp;</td>
<td align="left"><a href="/tags/ref_colorpicker.asp?colorhex=9ACD32">Shades</a></td>
<td align="left"><a href="/tags/ref_colormixer.asp?colorbottom=9ACD32&colortop=FFFFFF">Mix</a></td>
</tr>
}.scan(/<td[^>]*>&nbsp;<\/td>/).map {|s| s[/#([a-f0-9]+)/i, 1]}

=> ["FFFF00", "9ACD32"]


I wouldn't parse HTML with regex's either, but if I did I'd do it like this ;)

var html = '<tr>\n<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?color=Yellow">Yellow</a>&nbsp;</td>\n<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?hex=FFFF00">#FFFF00</a></td>\n<td bgcolor="#FFFF00">&nbsp;</td>\n<td align="left"><a href="/tags/ref_colorpicker.asp?colorhex=FFFF00">Shades</a></td>\n<td align="left"><a href="/tags/ref_colormixer.asp?colorbottom=FFFF00&colortop=FFFFFF">Mix</a></td>\n</tr>\n\n\n<tr>\n<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?color=YellowGreen">YellowGreen</a>&nbsp;</td>\n<td align="left"><a target="_blank" href="/tags/ref_color_tryit.asp?hex=9ACD32">#9ACD32</a></td>\n<td bgcolor="#9ACD32">&nbsp;</td>\n<td align="left"><a href="/tags/ref_colorpicker.asp?colorhex=9ACD32">Shades</a></td>\n<td align="left"><a href="/tags/ref_colormixer.asp?colorbottom=9ACD32&colortop=FFFFFF">Mix</a></td>\n</tr>'
        .split('\n'),    
    colors = [],
    i, l,
    match;

for(i = 0, l = html.length; i < l; i++) {
    if(match = /<td bgcolor="#([\da-fA-F]{6})">&nbsp;<\/td>/.exec(html[i])) {
        colors.push(match[1]);
    }
}

console.log(colors);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜