Feb 15, 2018

Our Comprehensive Guide to Python Dependencies

The following article is from Kiwi.com’s internal engineering handbook. We thought it could come in handy for others as well, so we decided to share it with the world here, like we’ve done before.

A refresher on the basics

What are requirements?

They are usually stored in requirements.txt as a list of 3rd party apps which your project depends on. An example of a requirements file:

myapp
framework==0.9.4
library>=0.2

Python libraries often use semantic versioning for the version numbers, which makes it easy to know when you can expect an upgrade to be safe without any of your code breaking. Taken from a semantic versioning cheat sheet, the gist of it is:

Versions look like 2.7.13, where the numbers are MAJOR.MINOR.PATCH
• MAJOR = incompatible API changes.
• MINOR = add functionality (backwards-compatible).
• PATCH = bug fixes (backwards-compatible).

How do you use a requirements list?

The pip install command downloads the packages from a PyPI server and installs them. If you are interested in the details, you can find them in pip’s documentation.

What’s a PyPI server?

The Python Package Index (PyPI) is a repository of software for the Python programming language. PyPI helps you find and install software developed and shared by the Python community.

Basically, PyPI is a server where you can upload your Python library. See a blog post with more on publishing libraries to PyPI.

When you use pip, by default it downloads libraries from https://pypi.python.org, which is where Python users upload their open source code (you should do the same with your open source projects). For internal — secret — code, you naturally can’t use the public PyPI. Instead, you can host your own PyPI server on a domain like https://pypi.example.com for this code.

Working with an internal PyPI

Downloading packages

To download and install internal packages from there, you will need an account on the internal PyPI at https://pypi.example.com. We suggest using a chat channel called #plz-access for access requests where people can ask for PyPI user credentials. Once you have your account, you need to let pip know what your login details are. To do this, add the following in your ~/.netrc file:

machine pypi.example.com
login <username>
password <password>

Protip: You can check if your ~/.netrc works properly by running python -m netrc, which parses it and prints the details from it.

Now, if someone has uploaded example-package already, you can try to run pip install --extra-index-url=https://pypi.example.com/pypi example-package. If you configured the details correctly, it should install the internal example-package package.

When building Docker images

The above way works great normally, but there’s one exception: when you run docker build . to build an image.

You need to add a few extra lines in your Dockerfile to make it work in that context. This gets especially difficult, because Docker has no support for build time secrets. There’s an old, old issue for it, but no actual solution. Instead, use a workaround to host a temporary secrets server to pass them.

Watch out here, as if you mess this up, you’ll end up accidentally committing your PyPI credentials to the build Docker image!

Here’s what you keep in your Dockerfile

ARG pypi_username
ARG pypi_passwordRUN apk add --no-cache curl && \
echo "machine pypi.example.com" >> ~/.netrc && \
echo " login $(curl -sS http://httpenv/v1/pypi_username||echo $pypi_username)" >> ~/.netrc && \
echo " password $(curl -sS http://httpenv/v1/pypi_password||echo $pypi_password)" >> ~/.netrc && \
chmod 600 ~/.netrc && \
pip install --no-cache-dir -r requirements.txt && \
rm ~/.netrc && \
apk del curl

If you have this set up, on dev machines, you can set the connection details with build arguments. But seriously, only use this for development, as this leaves your precious password visible in the built images. In fact, here comes an…

⚠️ insecure command warning ⚠️

docker build --build-arg pypi_username=<username> --build-arg pypi_password=<password> .

When building published production images, instead of using --build-arg, host an httpenv server with GitLab CI, like so:

build:
services:
- name: kiwicom/httpenv
alias: httpenv
script:
- export HE=$(getent hosts httpenv|awk '{print $1}')
- wget -q http://httpenv/v1/add/pypi_username/${pypi_username}
- wget -q http://httpenv/v1/add/pypi_password/${pypi_password}
- docker build --add-host=httpenv:$HE .

Setting requirements

When you add an internal package to your requirements.txt, you’ll need to include the index URL there too. So your requirements.txt will look something like this:

requests==2.18.1
urllib3==1.21.1--extra-index-url=https://pypi.example.com/pypi
example-package==1.0.2

(The order of the lines doesn’t actually matter, but it’s easier to read if you place the internal packages under the internal index URL)

Uploading packages

Once you have a proper package with a nice setup.py (you can base this on kiwicom/crane’s setup.py) make sure that the classifiers=[] list in it contains `’Private :: Do Not Upload’, as this will make the public PyPI outright reject the package if you ever accidentally attempt to push it there.

Use the following GitLab CI job to automatically publish new releases whenever you bump the version in setup.py:

release:
stage: release
image: python:3.6-alpine3.7
variables:
PYPI_CONFIG: |
[distutils]
index-servers = internal [internal]
repository: https://pypi.example.com/pypi/
username: $PYPI_USERNAME
password: $PYPI_PASSWORD
script:
- echo "$PYPI_CONFIG" > ~/.pypirc
- pip install wheel
- python setup.py bdist_wheel --universal upload -r internal
only:
- master
allow_failure: yes # always runs but fails if there's already a release for the version

Of course, for this to work, you’ll need to set the $PYPI_USERNAME and $PYPI_PASSWORD variables in your repository settings.

Managing requirements files

Splitting into files

Usually, you don’t need every dependency for the app to run on production. For instance, you don’t need your test runner there. For this reason, keep separate requirements files for each purpose:

  • requirements.txt has only the libraries necessary to run the app
  • docs-requirements.txt has libraries for generating docs
  • test-requirements.txt has libraries for running tests
  • you can come up with your own categories if needed; for instance some projects have a vendor-requirements.txt for installing packages received from vendors.

Now, to launch the app it’s enough to run pip install -r requirements.txt. Keep in mind though that it’s desirable to include all dependencies in Docker images to keep everything, including test runs, reproducible in a matching environment.

Managing versions

We use pip-tools’ pip-compile command extensively. pip-compile is a tool for saving a list of the exact packages and versions of packages that you depend on into your requirements.txt. You would list the names of your dependencies in a requirements.in file instead of manually editing requirements.txt. If you then run pip-compile, it will check all packages and save the full list of dependencies into requirements.txt, with specific versions to install.

See this example requirements.in:

myflaskapp

And the matching requirements.txt that pip-compile generates:

#
# This file is autogenerated by pip-compile
# To update, run:
#
# pip-compile --output-file requirements.txt requirements.in
#
flask==0.10.1 # via myflaskapp
jinja2==2.7.3 # via flask
markupsafe==0.23 # via jinja2
myflaskapp==0.24
werkzeug==0.10.4 # via flask

Now you should run pip install -r requirements.txt to get the exact same versions of everything as specified in the repo.

If you add a new package to the requirements.in file, you should run pip-compile again to add it to the requirements.txt without changing anything else in it.

Updating requirements

You can run pip list --outdated to see if any of your installed packages have an update available:

$ pip list --outdated
alembic (0.9.2) - Latest: 0.9.3 [sdist]
connexion (1.1.10) - Latest: 1.1.11 [sdist]
coverage (4.3.4) - Latest: 4.4.1 [wheel]

If you want to update them, bump the versions in requirements.txt by running pip-compile --upgradepip-compilewill find the newest possible versions of everything, and you can run pip install -r requirements.txt again to install those.

If you’re being super careful, you can also update just one of the packages, with a command like pip-compile --upgrade-package flask.

Pinning requirements

If you need to make sure a package isn’t upgraded, write something like this in requirements.in:

requests~=2.12.5  # https://github.com/shazow/urllib3/issues/1104

Note the following:

  1. We use the ~= operator. ~=2.12.5 means we need at least 2.12.5, but we can’t upgrade to the next minor version, 2.13. You can find more info about these operators in the versioning PEP.
  2. We added a comment with the reason for not upgrading. Make sure to always explain with a comment whenever you do something out of the ordinary with requirements.

Setting requirements for specific Python versions

You can also add environment markers to your requirements. This is useful if you are running your app on multiple platforms or Python versions. You can set something like this in your requirements.in:

test-app-3 ; python_version > '3'
linux-app ; sys_platform == 'linux'

and pip will only install test-app-3 if it’s running on Python 3, and linux-app only if it’s running on Linux.

Fixing merge conflicts in requirements files

This is a tricky problem: when people change requirements versions in different branches, you end up with a huge mess in requirements.txt. But fret not, pip-compile can actually make solving it super easy. Just follow these steps:

  • Start rebasing onto the conflicting branch, to start conflict resolution:
$ git rebase master
Applying: Some random commit with requirements changes
Using index info to reconstruct a base tree...
M requirements.in
M requirements.txt
Falling back to patching base and 3-way merge...
Auto-merging requirements.txt
CONFLICT (content): Merge conflict in requirements.in
CONFLICT (content): Merge conflict in requirements.txt
error: Failed to merge in the changes.
  • Fix requirements.in if it has conflicts (these will be easy and logical to resolve).
  • Run the following:
git checkout --ours requirements.txt  # reset to master's .txt
pip-compile # make pip-compile generate a new .txt based on master's
git add requirements.*
git rebase --continue

Automatic checks in CI

Detecting vulnerable versions

coala has a bear for checking if you have any dependencies with known security vulnerabilities. Add this to your .coafile to check if any of your requirements is in pyup.io‘s vulnerability database.

[safety]
bears = PySafetyBear
files = *requirements.txt

If it finds a vulnerability, just follow the above upgrading guide.

Detecting unpinned requirements

In your *requirements.txt file you should have all requirements pinned to specific versions. This guarantees that you have the same version on production as you assume. coala can check this for you as well:

[pin-requirements]
bears = PinRequirementsBear
files = *requirements.txt

pip-compile makes it easy to comply with this bear, since it fully pins everything by default.

Detecting Python 3 incompatible packages

caniusepython3 can check your requirements list to see if all your dependencies are compatible with Python 3. This can prevent you from making the Python 3 migration more difficult by adding incompatible dependencies.

caniusepython3 -r requirements.txt

If you want, you can add this as a GitLab CI command too.

What do I do with legacy projects?

Sometimes you have ancient projects with nasty, messy, old dependencies. Here’s how you can salvage the situation in those cases.

  1. Read this guide ???? (9 minutes)
  2. Remove unneeded legacy requirements; I bet you will find some. Just use git grep <package name> to see if they’re all used. (20 minutes)
    You can use this command which will tell you good candidates to check: cat *requirements.in | grep --color=none "^\w\S\+" -o | uniq | xargs -I{} bash -c '! git --no-pager grep -w -q -i {} "*.py" && echo "{} not found in Python files"'
  3. Split your requirements into multiple files by purpose. (20 minutes)
  4. Rename your requirements files to *requirements.in files and generate a requirements.txt with pip-compile(20 minutes)
  5. Try to unpin your requirements one by one and upgrade them in requirements.txt. You will need to read the changelogs of the libraries and might need changes in your codebase. (1 day)
  6. Set up coala checks of your current requirements. Just copy the ones from above. (10 minutes)
  7. Set up a reminder to regularly upgrade your requirements. On Slack, you can use /remind #<channel> every Tuesday @<your name> upgrade requirements for this. (1 minute)

When the time comes to upgrade your requirements, run the following commands (on macOS you might need to use gfind instead of find):

git stash
git checkout master
git pull
git branch -D pip-up
git checkout -b pip-up
find . -name "*requirements.in" -exec pip-compile --upgrade "{}" \;
git add *requirements.txt
git commit -m 'Update requirements'
git push --force origin pip-up
git show --unified=0

Protip: You can just alias all of this to a pip-up command.

Now:

  1. A list of upgraded packages will be displayed. Check the changelogs of these packages.
  2. Run the tests and/or deploy to a canary environment to find out about any possible errors.
  3. Merge the branch into master.
  4. ???? Enjoy your up-to-date requirements????

If the project is no longer actively maintained, you should at least set a scheduled pipeline on GitLab which will regularly run the check of requirements (at least for safety) and let you know if there is a known vulnerability.

Closing Remarks

So that’s pretty much everything we figured out about managing Python packages so far. Hopefully some of it will come in handy, hm?

One last thing to note is that we’re keeping an eye on the development of Pipenv, a utility from the maintainers of pip itself. We’re hoping to later replace pip-compilewith it, but as of this article’s publication, it does not seem mature enough yet to just go and rework everything.

Acknowledgements

The internal article this one was based on was co-authored by 

Stanislav Komanec.

Search
Share
Featured articles
Generating SwiftUI snapshot tests with Swift macros
Don’t Fix Bad Data, Do This Instead