Best way to download all artifacts listed in an HTML5 cache.manifest file?

2023-04-04 15:41 问答作者：

I am attempting to look at how an HTML5 app works and any attempts to save the page inside the webkit browsers (chrome, Safari) includes some, but not all of the cache.manifest resources. Is there a library or set of code t开发者_如何转开发hat will parse the cache.manifest file, and download all the resources (images, scripts, css)?

(original code moved to answer... noob mistake >.<)

I originally posted this as part of the question... (no newbie stackoverflow poster EVER does this ;)

since there was a resounding lack of answers. Here you go:

I was able to come up with the following python script to do so, but any input would be appreciated =) (This is my first stab at python code so there might be a better way)

import os
import urllib2
import urllib

cmServerURL = 'http://<serverURL>:<port>/<path-to-cache.manifest>'

# download file code taken from stackoverflow
# http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python
def loadURL(url, dirToSave):
        file_name = url.split('/')[-1]
        u = urllib2.urlopen(url)
        f = open(dirToSave, 'wb')
        meta = u.info()
        file_size = int(meta.getheaders("Content-Length")[0])
        print "Downloading: %s Bytes: %s" % (file_name, file_size)

        file_size_dl = 0
        block_sz = 8192
        while True:
                buffer = u.read(block_sz)
                if not buffer:
                        break

                file_size_dl += len(buffer)
                f.write(buffer)
                status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
                status = status + chr(8)*(len(status)+1)
                print status,

        f.close()

# download the cache.manifest file
# since this request doesn't include the Conent-Length header we will use a different api =P
urllib.urlretrieve (cmServerURL+ 'cache.manifest', './cache.manifest')

# open the cache.manifest and go through line-by-line checking for the existance of files
f = open('cache.manifest', 'r')
for line in f:
        filepath = line.split('/')
        if len(filepath) > 1:
                fileName = line.strip()
                # if the file doesn't exist, lets download it
                if not os.path.exists(fileName):
                                print 'NOT FOUND: ' + line
                                dirName = os.path.dirname(fileName)
                                print 'checking dirctory: ' + dirName
                                if not os.path.exists(dirName):
                                        os.makedirs(dirName)
                                else:
                                        print 'directory exists'
                                print 'downloading file: ' + cmServerURL + line,
                                loadURL (cmServerURL+fileName, fileName)

继续阅读：parsing python

Best way to download all artifacts listed in an HTML5 cache.manifest file?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？