开发者

transforming url's, python

I have a txt file which contains some url's:

[http://igu.org.ru/ International Geographical Union - Russian National Committee]
[http://www.geografos.org Colegio de Geógrafos - España]
[http://www.geografs.org Col.legi de Geògrafs - Catalunya]
[http://www.geografs.org]

now I want to transform this external links in the following way (in the fixed order):

replace "[url any text]" with "any text", where "url" is an URL (e.g., starts with "http://").

replace "[url]" with "url"

import re
def openfile(filename):
    w开发者_StackOverflowith codecs.open(filename, encoding="utf-8") as F:
        replace = F.read()
        replace = re.sub(r'\[http://.+ ...) # should replace "[url any text]" with "any text"
        replace = re.sub(...) # should replace "[url]" with "url"

any suggestions?


re1 = re.compile(r'\[(http[^\s]*)\s(.*)\]')
re2 = re.compile(r'\[(http[^\s]*)\]')
with codecs.open(filename, encoding='utf-8') as F:
    text = F.read()
    pre_filter = re1.sub('\g<2>', text)
    result = re2.sub('\g<1>', pre_filter)

to process the your text. For further informations in the background you can read: http://docs.python.org/howto/regex.html#search-and-replace

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜