Best way to download all artifacts listed in an HTML5 cache.manifest file?
I am attempting to look at how an HTML5 app works and any attempts to save the page inside the webkit browsers (chrome, Safari) includes some, but not all of the cache.manifest resources. Is there a library or set of code t开发者_如何转开发hat will parse the cache.manifest file, and download all the resources (images, scripts, css)?
(original code moved to answer... noob mistake >.<)
I originally posted this as part of the question... (no newbie stackoverflow poster EVER does this ;)
since there was a resounding lack of answers. Here you go:
I was able to come up with the following python script to do so, but any input would be appreciated =) (This is my first stab at python code so there might be a better way)
import os
import urllib2
import urllib
cmServerURL = 'http://<serverURL>:<port>/<path-to-cache.manifest>'
# download file code taken from stackoverflow
# http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python
def loadURL(url, dirToSave):
file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(dirToSave, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
status = status + chr(8)*(len(status)+1)
print status,
f.close()
# download the cache.manifest file
# since this request doesn't include the Conent-Length header we will use a different api =P
urllib.urlretrieve (cmServerURL+ 'cache.manifest', './cache.manifest')
# open the cache.manifest and go through line-by-line checking for the existance of files
f = open('cache.manifest', 'r')
for line in f:
filepath = line.split('/')
if len(filepath) > 1:
fileName = line.strip()
# if the file doesn't exist, lets download it
if not os.path.exists(fileName):
print 'NOT FOUND: ' + line
dirName = os.path.dirname(fileName)
print 'checking dirctory: ' + dirName
if not os.path.exists(dirName):
os.makedirs(dirName)
else:
print 'directory exists'
print 'downloading file: ' + cmServerURL + line,
loadURL (cmServerURL+fileName, fileName)
精彩评论