开发者

regex find links

Is it possible to split the following links. I can select a link but not when they are p开发者_运维问答aste right next to each other.

Thanx

Example: (I want to select these 3 links separately) http://www.fileserve.com/file/7kXswvM/part1.rarhttp://www.fileserve.com/file/r4F3Gmh/part2.rarhttp://www.fileserve.com/file/r4F3Gmh/part3.rar


You could split at http:// and re-prepend that to every link (assuming it's all http only).


http://(?:(?!http://).)*

will match a string that starts with http:// up until either the next occurrence of http:// or the end of the string.

>>> re.findall(r'http://(?:(?!http://).)*', 'http://www.fileserve.com/file/7kXswvM/part1.rarhttp://www.fileserve.com/file/r4F3Gmh/part2.rarhttp://www.fileserve.com/file/r4F3Gmh/part3.rar')
['http://www.fileserve.com/file/7kXswvM/part1.rar', 
'http://www.fileserve.com/file/r4F3Gmh/part2.rar', 
'http://www.fileserve.com/file/r4F3Gmh/part3.rar']

This will of course not quite work if anything other than a link follows in the input. As an alternative, the following regex will match until the next http:// or until the next space (or end of string):

http://(?:(?!http://|\s).)*


Match http://-s and split. According to the spec it cannot appear in another parts of the URL.


s/(?<=.)(?=http:)/\n/g;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜