开发者

RegEx problem or maybe another solution altogether?

The problem I'm having is that I have a block of JavaScript I've successfully scraped out of a websites source and now I have to sift through to get the specific开发者_运维百科 values I'm looking for.

I need to find flvFileName and get all the file names listed. In this case it's 'trailer1,trailer2,trailer3'.

At first I started using regex to match the start and end tags and then match the file names and extract them to an array, but the problem is that there isn't always three videos in the list. There could be zero or more, so matching doesn't work. Any thoughts on a way to approach this that won't make me continue to abuse my laptop?

... ,flashvars: {flvFileName: 'trailer1,trailer2,trailer3', age: 'no', isForced: 'true'} }); });


Assuming it's a string (or you can get it to be a string)

p str.split(/flvFileName: '|', age/)[1].split(',')
#=> ["trailer1", "trailer2", "trailer3"]

This will split the thing in 3 parts:

  • everything before flvFileName: '
  • the good stuff
  • everything after ', age

Then split the good stuff on a comma.


You could try using RKelly to parse the JavaScript into Ruby for you.

Or, since Aaron seems to have abandoned RKelly, you could try its replacement, Johnson.


How about something like: \bflvFileName\s*:\s*("|')(?:\s*([^,\1\s]+)\s*,?)+(?<!,)\s*\1

You might have to escape those backslashes; I don't know about Ruby, but you would in .NET. Note the backreference; that's the \1 above. I'm using it to indicate that the filenames are wrapped in matching " or ' characters.

All the \s might be unnecessary, but I'm leaving them in there to be thorough. I'm assuming there might be any amount of whitespace around the special characters (:, ", ,, etc.). YMMV.

Also: ([^,\1\s]+) might be too broad for describing filenames, depending on what you consider valid. You might want to use ((?:\w|\.)+) instead.

Some reference information if the above is hard to grok: regular-expressions.info/reference.html


What if you do it in the old way?

start = string.index(flvFileName)
quoteStart = string.index("'", start)
quoteEnd = string.index("'", quoteStart)
trailersString = string.slice(quoteStart, quoteEnd)
trailers = string.split(",")

it's not beautiful, but it works. And you might need to do something special for the case when there's no trailer.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜