agile sysadmin

by Ferenc Erki

Packaging Go dependencies for Gentoo

While I don’t write Go code, I do package Go projects for Gentoo Linux. My previous post about how to Host Gentoo dependency tarballs as GitHub releases got me invited to give a talk at the Google Developer Group Berlin Golang meetup, to host a community round table discussion at GopherCon Europe 2024, and to hold an online workshop for Gentoo e.V.

My presentation expands on the above post, and I decided to share the extra parts in this follow-up, including an idea for Go project maintainers who wish to streamline the distribution of their work.

Introduction to Gentoo packaging

While I won’t go into great detail about Gentoo packaging here, one aspect stands out as relevant: requiring all the source packages already available before starting the build process.

Gentoo empowers users to build software from source according to their needs. Ebuild files describe how to build a given software, including where to find its sources (SRC_URI), and executable build instructions. An auxiliary Manifest file contains the name, size, and checksum of the sources.

The package manager works in phases. A set of sandbox features protect the system during the build process from any unexpected operations, like accessing the network.

What’s wrong with accessing the network during the build process?

  • leads to higher risk for bugs and security issues, through increased complexity of proper implementation, review, and verification
  • impossible to know resource usage in advance, so it may fill up disks, or cause high cost on expensive network connections
  • hurts reproducibility, since sources may change, or may become unavailable on the network between two builds
  • raises privacy concerns, for example possible data exfiltration
  • does not work on air-gapped systems

Due to the above, having all the sources available before starting the build process has a lot advantages.

As a consequence, the package manager may not directly get dependencies with go get, go mod download or similar commands while building Go projects. Instead, the ebuild must list any supplementary sources, allowing the package manager to download and verify these in advance as usual.

Three ways to approach Go dependencies

Go projects have different ways available to them to describe and manage their dependencies. This also affects downstream packaging when distributing the software towards end users. Let’s take a closer look at these methods.

Vendor directory

Some projects include their dependencies in a vendor directory in their repository. This feels great from a reproducibility perspective, since the source code already includes all dependencies as well. It also means the build system already has everything it needs from a single source of truth.

What’s wrong with vendor directories?

  • does not fit well for libraries that other projects may import
  • bloats the repository
  • makes it more difficult to contribute and review changes

Because of the above, projects often won’t or can’t have a vendor directory, since they may consider it inefficient or even ineffective.

Go modules

Go modules describe their dependencies in a go.mod file, accompanied by their checksums in a go.sum file.

Gentoo can make use of the latter by slightly transforming it into an EGO_SUM variable for ebuilds. This allows generating the list of extra source files, and storing them along their checksums in the Manifest files as usual, ultimately allowing to download and verify them in advance.

What’s wrong with EGO_SUM?

  • bloats ebuilds and Manifest files, because each dependency means an extra line in both of these files for each version of the software
  • leads to disproportionate resource usage in the package repository, through a small number of packages requiring a considerable share of storage share, bandwidth, and so on
  • multiplies the effects of the overhead, since the bloat gets synchronized as part of the package repository to every Gentoo system even if it does not install any Go projects
  • breaks the installation process in worst cases, since the list of dependencies may grow larger than the underlying platform’s allowed largest variable size of 128 kB

As a result, Gentoo decided to deprecate this approach in favor of dependency tarballs, and using EGO_SUM in ebuilds leads to QA warnings.

Prepackaged dependencies

What if we could download all dependencies, create a (compressed) tarball archive out of them, and use it as an auxiliary source for the project getting packaged? It would mean only two entries to describe the list of sources, thus much less bloat for the ebuild and Manifest files.

Apparently we have at least two ways to get the dependencies:

  1. Vendor tarballs

    In general, the go mod vendor command makes a copy of all packages required to build the given project, and places them in the vendor directory, which then can go into a compressed vendor tarball.

    Unfortunately this approach may miss some of the dependencies because it prunes non-package directories, for example when dependencies relying on compiling C or C++ code. While this approach does worth a try, it does not work in all situations.

  2. Dependency tarballs

    Using the go mod download command instead guarantees to download all modules into the module cache directory, which then can serve as the source for a compressed dependency tarball.

What’s wrong with prepackaged dependencies?

  • lacks deduplication of dependencies shared between packaged projects, since each packaged project includes a copy in its own dependency tarball
  • pushes hosting responsibilities downstream, which may pose a challenge, especially for external contributors like proxied maintainers and GURU contributors
  • tarballs may grow large, even hundreds of megabytes, complicating hosting challenges further through increased storage and traffic requirements
  • changes the security trust model to include not only the upstream developers of the software and its dependencies, but also whoever creates and hosts these tarballs

I consider the last point the most important, because that stands out as a security issue, rather than resource usage overhead. I find it worthwhile to take a closer look at that.

Direct end users of a software need to trust the upstream developers, including the creators of its dependencies. Sounds fair.

Consumers of the same software via the package managers of their operating system need to also trust the package maintainers. Sounds acceptable.

Users of packages maintained by external contributors have to also trust the external maintainers and their chosen hosting solution. Sounds risky in any security-conscious environment.

How can we do better?

At this point I have two approach in mind which would potentially solve the security and hosting challenges I encountered while packaging.

  1. Include the dependency directory in upstream’s normal release tarballs.

    In this case upstream does not have to pollute its source repository with all the dependencies, while still providing an official set of dependencies as part of their normal releases. I expect this would improve reproducibility for all downstream users.

  2. Include the dependencies in a separate tarball as an extra artifact in upstream’s normal releases.

    As a slight variation of the previous approach, this would enable separating the canonical sources from the dependencies, by providing a separate release artifact. I expect this allows downstream users to choose their preferred way of getting dependencies.

In both cases the hosting challenge gets addressed where hosting already happens, which streamlines distribution for the project. More importantly, end users would get everything they need to build the project from the already most trusted source: the canonical upstream.

Most upstream projects have a well-documented, or even automated release process, which appears as a great candidate for contributions in your favorite project.

Do you maintain a Go project where this sounds useful and compatible? Let’s collaborate on it as a proof of concept experiment! I can even offer attempting to package your project for Gentoo, if we can try out one of the proposed approaches in the official releases.

Please reach out on one of my contact options if you consider this interesting.