开发者

Obtaining loaded images from a given URL via Python

Is there anyway to load a URL via Python a开发者_开发问答nd then retrieve a list of all of the images that were loaded via that URL? I'm essentially looking to do something similar to TamperData or Fiddler and retrieve a list of all images that a given website loaded.


Interesting task. Here's one way of solving it, along the line suggested by Jochen Ritzel.

It uses pylibpcap instead of pycap. Personally, I find pycap to be hard to work with due to little amount of documentation available. For pylibpcap, you can translate most code directly from the libpcap examples (see for example this tutorial for a nice introduction). The man pages for tcpdump and pcap are also great resources.

You may want to look at the standards for Ethernet, IPv4, TCP, and HTTP.

Note 1: The code below only prints out the HTTP GET requests. Filtering out the images and downloading them using the urllib module should pose no problem.

Note 2: This code works on Linux, not sure what device names you need to use on Windows/MacOS. You'll also need root privileges.

#!/usr/bin/env python

import pcap
import struct

def parse_packet(data):
    """
    Parse Ethernet/IP/TCP packet.
    """
    # See the Ethernet, IP, and TCP standards for details.

    data = data[14:] # Strip Ethernet header

    header_length = 4 * (ord(data[0]) & 0x0f) # in bytes
    data = data[header_length:]  # Strip IP header

    dest_port = struct.unpack('!H', data[2:4])[0]
    if not dest_port == 80: # This is an outgoing package
        return

    header_length = 4 * ((ord(data[12]) & 0xf0) >> 4) # in bytes
    data = data[header_length:] # Strip TCP header

    return data


def parse_get(data):
    """
    Parse a HTTP GET request, returning the request URI.
    """
    if data is None or not data.startswith('GET'):
        return

    fields = data.split('\n')
    uri = fields[0].split()[1]

    for field in fields[1:]:
        if field.lower().startswith('host:'):
            return field[5:].strip() + uri


def packet_handler(length, data, timestamp):
    uri = parse_get(parse_packet(data))
    if not uri is None:
        print uri


# Set up pcap sniffer
INTERFACE = 'wlan0'
FILTER = 'tcp port 80'
p = pcap.pcapObject()
p.open_live(INTERFACE, 1600, 0, 100)
p.setfilter(FILTER, 0, 0)

try:
    while True:
        p.dispatch(1, packet_handler)
except KeyboardInterrupt:
    pass
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜