开发者

How do I install from a local cache with pip?

I install a lot of the same packages in different virtualenv environments. Is开发者_Go百科 there a way that I can download a package once and then have pip install from a local cache?

This would reduce download bandwidth and time.


Updated Answer 19-Nov-15

According to the Pip documentation:

Starting with v6.0, pip provides an on by default cache which functions similarly to that of a web browser. While the cache is on by default and is designed do the right thing by default you can disable the cache and always access PyPI by utilizing the --no-cache-dir option.

Therefore, the updated answer is to just use pip with its defaults if you want a download cache.

Original Answer

From the pip news, version 0.1.4:

Added support for an environmental variable $PIP_DOWNLOAD_CACHE which will cache package downloads, so future installations won’t require large downloads. Network access is still required, but just some downloads will be avoided when using this.

To take advantage of this, I've added the following to my ~/.bash_profile:

export PIP_DOWNLOAD_CACHE=$HOME/.pip_download_cache

or, if you are on a Mac:

export PIP_DOWNLOAD_CACHE=$HOME/Library/Caches/pip-downloads

Notes

  1. If a newer version of a package is detected, it will be downloaded and added to the PIP_DOWNLOAD_CACHE directory. For instance, I now have quite a few Django packages.
  2. This doesn't remove the need for network access, as stated in the pip news, so it's not the answer for creating new virtualenvs on the airplane, but it's still great.


In my opinion, pip2pi is a much more elegant and reliable solution for this problem.

From the docs:

pip2pi builds a PyPI-compatible package repository from pip requirements

pip2pi allows you to create your own PyPI index by using two simple commands:

  1. To mirror a package and all of its requirements, use pip2tgz:

    $ cd /tmp/; mkdir package/
    $ pip2tgz packages/ httpie==0.2
    ...
    $ ls packages/
    Pygments-1.5.tar.gz
    httpie-0.2.0.tar.gz
    requests-0.14.0.tar.gz
    
  2. To build a package index from the previous directory:

    $ ls packages/
    bar-0.8.tar.gz
    baz-0.3.tar.gz
    foo-1.2.tar.gz
    $ dir2pi packages/
    $ find packages/
    /httpie-0.2.0.tar.gz
    /Pygments-1.5.tar.gz
    /requests-0.14.0.tar.gz
    /simple
    /simple/httpie
    /simple/httpie/httpie-0.2.0.tar.gz
    /simple/Pygments
    /simple/Pygments/Pygments-1.5.tar.gz
    /simple/requests
    /simple/requests/requests-0.14.0.tar.gz
    
  3. To install from the index you built in step 2., you can simply use:

    pip install --index-url=file:///tmp/packages/simple/ httpie==0.2
    

You can even mirror your own index to a remote host with pip2pi.


For newer Pip versions:

Newer Pip versions now cache downloads by default. See this documentation:

https://pip.pypa.io/en/stable/topics/caching/

For older Pip versions:

Create a configuration file named ~/.pip/pip.conf, and add the following contents:

[global]
download_cache = ~/.cache/pip

On OS X, a better path to choose would be ~/Library/Caches/pip since it follows the convention other OS X programs use.


PIP_DOWNLOAD_CACHE has some serious problems. Most importantly, it encodes the hostname of the download into the cache, so using mirrors becomes impossible.

The better way to manage a cache of pip downloads is to separate the "download the package" step from the "install the package" step. The downloaded files are commonly referred to as "sdist files" (source distributions) and I'm going to store them in a directory $SDIST_CACHE.

The two steps end up being:

pip install --no-install --use-mirrors -I --download=$SDIST_CACHE <package name>

Which will download the package and place it in the directory pointed to by $SDIST_CACHE. It will not install the package. And then you run:

pip install --find-links=file://$SDIST_CACHE --no-index --index-url=file:///dev/null <package name> 

To install the package into your virtual environment. Ideally, $SDIST_CACHE would be committed under your source control. When deploying to production, you would run only the second pip command to install the packages without downloading them.


Starting in version 6.0, pip now does it's own caching:

  • DEPRECATION pip install --download-cache and pip wheel --download-cache command line flags have been deprecated and the functionality removed. Since pip now automatically configures and uses it’s internal HTTP cache which supplants the --download-cache the existing options have been made non functional but will still be accepted until their removal in pip v8.0. For more information please see https://pip.pypa.io/en/latest/reference/pip_install.html#caching

More information from the above link:

Starting with v6.0, pip provides an on by default cache which functions similarly to that of a web browser. While the cache is on by default and is designed do the right thing by default you can disable the cache and always access PyPI by utilizing the --no-cache-dir option.


pip wheel is an excellent option that does what you want with the extra feature of pre-compiling the packages. From the official docs:

Build wheels for a requirement (and all its dependencies):

$ pip wheel --wheel-dir=/tmp/wheelhouse SomePackage

Now your /tmp/wheelhouse directory has all your dependencies precompiled, so you can copy the folder to another server and install everything with this command:

$ pip install --no-index --find-links=/tmp/wheelhouse SomePackage

Note that not all the the packages will be completely portable across machines. Some packages will be built specifically for the Python version, OS distribution and/or hardware architecture that you're using. That will be specified in the file name, like -cp27-none-linux_x86_64 for CPython 2.7 on a 64-bit Linux, etc.


Using pip only (my version is 1.2.1), you can also build up a local repository like this:

if ! pip install --find-links="file://$PIP_SDIST_INDEX" --no-index <package>; then
    pip install --download-directory="$PIP_SDIST_INDEX" <package>
    pip install --find-links="file://$PIP_SDIST_INDEX" --no-index <package>
fi

In the first call of pip, the packages from the requirements file are looked up in the local repository (only), and then installed from there. If that fails, pip retrieves the packages from its usual location (e.g. PyPI) and downloads it to the PIP_SDIST_INDEX (but does not install anything!). The first call is "repeated" to properly install the package from the local index.

(--download-cache creates a local file name which is the complete (escaped) URL, and pip cannot use this as an index with --find-links. --download-cache will use the cached file, if found. We could add this option to the second call of pip, but since the index already functions as a kind of cache, it does not necessarily bring a lot. It would help if your index is emptied, for instance.)


A simpler option is basket.

Given a package name, it will download it and all dependencies to a central location; without any of the drawbacks of pip cache. This is great for offline use.

You can then use this directory as a source for pip:

pip install --no-index -f file:///path/to/basket package

Or easy_install:

easy_install -f ~/path/to/basket -H None package

You can also use it to update the basket whenever you are online.


There is a new solution to this called pip-accel, a drop-in replacement for pip with caching built in.

The pip-accel program is a wrapper for pip, the Python package manager. It accelerates the usage of pip to initialize Python virtual environments given one or more requirements files. It does so by combining the following two approaches:

  • Source distribution downloads are cached and used to generate a local index of source distribution archives.

  • Binary distributions are used to speed up the process of installing dependencies with binary components (like M2Crypto and LXML). Instead of recompiling these dependencies again for every virtual environment we compile them once and cache the result as a binary *.tar.gz distribution.

Paylogic uses pip-accel to quickly and reliably initialize virtual environments on its farm of continuous integration slaves which are constantly running unit tests (this was one of the original use cases for which pip-accel was developed). We also use it on our build servers.

We've seen around 10x speedup from switching from pip to pip-accel.


I think the package "pip-accel" must be a good choice.


I found the following to be useful for downloading packages and then installing from those downloads:

pip download -d "$SOME_DIRECTORY" some-package

Then to install:

pip install --no-index --no-cache-dir --find-links="$SOME_DIRECTORY"

Where $SOME_DIRECTORY is the path to the directory that the packages are to be downloaded to.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜