开发者

Comparing two urls in Python

Is there a standard way to compare two开发者_运维百科 urls in Python - that implements are_url_the_same in this example:

url_1 = 'http://www.foo.com/bar?a=b&c=d'
url_2 = 'http://www.foo.com:80/bar?c=d;a=b'

if are_urls_the_same(url_1, url2):
    print "URLs are the same"

By the same I mean that they access the same resource - so the two urls in the example are the same.


Here is a simple class that enables you to do this:

if Url(url1) == Url(url2):
    pass

It could easily be revamped as a function, though these objects are hashable, and therefore enable you to add them into a cache using a set or dictionary:

# Python 2
# from urlparse import urlparse, parse_qsl
# from urllib import unquote_plus
# Python 3
from urllib.parse import urlparse, parse_qsl, unquote_plus
    
class Url(object):
    '''A url object that can be compared with other url orbjects
    without regard to the vagaries of encoding, escaping, and ordering
    of parameters in query strings.'''

    def __init__(self, url):
        parts = urlparse(url)
        _query = frozenset(parse_qsl(parts.query))
        _path = unquote_plus(parts.path)
        parts = parts._replace(query=_query, path=_path)
        self.parts = parts

    def __eq__(self, other):
        return self.parts == other.parts

    def __hash__(self):
        return hash(self.parts)


Use urlparse and write a comparison function with the fields that you need

>>> from urllib.parse import urlparse
>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')

And you can compare on any of the following:

  1. scheme 0 URL scheme specifier
  2. netloc 1 Network location part
  3. path 2 Hierarchical path
  4. params 3 Parameters for last path element
  5. query 4 Query component
  6. fragment 5 Fragment identifier
  7. username User name
  8. password Password
  9. hostname Host name (lower case)
  10. port Port number as integer, if present
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜