开发者

sorting lines based on data in the middle of the line (in python)

I have a list of domains and I want to sort them b开发者_StackOverflowased on tld. whats the fastest way to do this?


Use the key parameter to .sort() to provide a function that can retrieve the proper data to sort by.

import urlparse

def get_tld_from_domain(domain)
    return urlparse.urlparse(domain).netloc.split('.')[-1]

list_of_domains.sort(key=get_tld_from_domain)

# or if you want to make a new list, instead of sorting the old one
sorted_list_of_domains = sorted(list_of_domains, key=get_tld_from_domain)

If you preferred, you could not define the function separately but instead just use a lambda function, but defining it separately can often make your code easier to read, which is always a plus.


Also, remember that it is not trivial to get the TLD from a URL. Please check this link on SO. In python you can use the urlparse to parse URLs.


As Gangadhar says, it's hard to know definitively which part of the netloc is the tld, but in your case I would modify Amber's code slightly. This will sort on the entire domain, by the last level first, then the second to last level, and so on.

This may be good enough for what you need without needing to refer to external lists

import urlparse

def get_reversed_domain(domain)
    return urlparse.urlparse(domain).netloc.split('.')[::-1]

sorted_list_of_domains = sorted(list_of_domains, key=get_reversed_domain)

Just reread the OP, if the list is already just domains you can simply use

sorted_list_of_domains = sorted(list_of_domains, key=lambda x:x.split('.')[::-1])
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜