Pythonic List Comprehension
This seems like a common task, alter some elements of an array, but my solution didn't feel very pythonic. Is there a better way to build urls
with list comprehension?
links = re.findall(r"(?:https?://|www\.|https?://www\.)[\S]+", text)
if len(links) == 0:
return text
urls = []
for link i开发者_C百科n links:
if link[0:4] == "www.":
link = "http://" + link
urls.append(link)
Maybe something like
links = re.findall(r"(?:https?://|www\.|https?://www\.)[\S]+", text)
if len(links) == 0:
return text
urls = map(lambda x : something(x), links)
If you want to go with list comprehensions, use:
urls = ['http://' + link if link.startswith('www.') else link for link in links]
But I actually think that the more verbose way of looping through the links that you used is easier to read. "Shorter" does not always equal "better" or "more readable".
["http://"+link if link[0:4]=='www.' else link for link in links]
or
[link[0:4]=='www.' and "http://"+link or link for link in links]
Notes:
("http://"+link if link[0:4]=='www.' else link)
- this is ternary operator like ?: in C
(link[0:4]=='www.' and "http://"+link or link)
- this has the same meaning.
On another subject: I would test for http://, not for www. Domains don't have to start with www. For instance, http://stackoverflow.com.
You'll probably be better off using built-in Python functionality for dealing with urls. Assuming you stay with your current regex, I think you could rewrite this as:
from urlparse import urlsplit, urlunsplit
links = re.findall("(?:https?://|www\.|https?://www\.)[\S]+", text)
urls = [urlunsplit(urlsplit(link, 'http')) for link links]
This should come out to the same thing as what you're currently doing. Also keep in mind that finding URLs using a regex is somewhat risky, ie this will return www.google.com! with the exclamation mark.
Alternatively:
def addHttp(url):
if url[0:4] == "www.":
url = "http://" + url
return url
urls = map(addHttp, links)
this is longer than using list comprehensions and the ternary operator, but IMHO it is more readable since the function name describes what it is doing, so the code is self-documenting. It is also easier to refactor e.g. if you decide to follow yu_sha's advice and not test explicitly for "www".
精彩评论