Packaging Go dependencies for Gentoo
While I donât write Go code, I do package Go projects for Gentoo Linux. My previous post about how to Host Gentoo dependency tarballs as GitHub releases got me invited to give a talk at the Google Developer Group Berlin Golang meetup, to host a community round table discussion at GopherCon Europe 2024, and to hold an online workshop for Gentoo e.V.
My presentation expands on the above post, and I decided to share the extra parts in this follow-up, including an idea for Go project maintainers who wish to streamline the distribution of their work.
Introduction to Gentoo packaging
While I wonât go into great detail about Gentoo packaging here, one aspect stands out as relevant: requiring all the source packages already available before starting the build process.
Gentoo empowers users to build software from source according to their needs.
Ebuild
files
describe how to build a given software, including where to find its sources
(SRC_URI
), and executable build instructions. An auxiliary
Manifest
file contains the name, size, and checksum of the sources.
The package manager works in phases. A set of sandbox features protect the system during the build process from any unexpected operations, like accessing the network.
Whatâs wrong with accessing the network during the build process?
- leads to higher risk for bugs and security issues, through increased complexity of proper implementation, review, and verification
- impossible to know resource usage in advance, so it may fill up disks, or cause high cost on expensive network connections
- hurts reproducibility, since sources may change, or may become unavailable on the network between two builds
- raises privacy concerns, for example possible data exfiltration
- does not work on air-gapped systems
Due to the above, having all the sources available before starting the build process has a lot advantages.
As a consequence, the package manager may not directly get dependencies with
go get
, go mod download
or similar commands while building Go projects.
Instead, the ebuild must list any supplementary sources, allowing the package
manager to download and verify these in advance as usual.
Three ways to approach Go dependencies
Go projects have different ways available to them to describe and manage their dependencies. This also affects downstream packaging when distributing the software towards end users. Letâs take a closer look at these methods.
Vendor directory
Some projects include their dependencies in a vendor
directory in their
repository. This feels great from a reproducibility perspective, since the
source code already includes all dependencies as well. It also means the build
system already has everything it needs from a single source of truth.
Whatâs wrong with vendor
directories?
- does not fit well for libraries that other projects may import
- bloats the repository
- makes it more difficult to contribute and review changes
Because of the above, projects often wonât or canât have a vendor
directory,
since they may consider it inefficient or even ineffective.
Go modules
Go modules describe their dependencies in a go.mod
file, accompanied by their
checksums in a go.sum
file.
Gentoo can make use of the latter by slightly transforming it into an EGO_SUM
variable for ebuilds. This allows generating the list of extra source files,
and storing them along their checksums in the Manifest
files as usual,
ultimately allowing to download and verify them in advance.
Whatâs wrong with EGO_SUM
?
- bloats ebuilds and
Manifest
files, because each dependency means an extra line in both of these files for each version of the software - leads to disproportionate resource usage in the package repository, through a small number of packages requiring a considerable share of storage share, bandwidth, and so on
- multiplies the effects of the overhead, since the bloat gets synchronized as part of the package repository to every Gentoo system even if it does not install any Go projects
- breaks the installation process in worst cases, since the list of dependencies may grow larger than the underlying platformâs allowed largest variable size of 128 kB
As a result, Gentoo decided to deprecate this approach in favor of dependency
tarballs, and using EGO_SUM
in ebuilds leads to QA warnings.
Prepackaged dependencies
What if we could download all dependencies, create a (compressed) tarball
archive out of them, and use it as an auxiliary source for the project getting
packaged? It would mean only two entries to describe the list of sources, thus
much less bloat for the ebuild and Manifest
files.
Apparently we have at least two ways to get the dependencies:
Vendor tarballs
In general, the
go mod vendor
command makes a copy of all packages required to build the given project, and places them in thevendor
directory, which then can go into a compressed vendor tarball.Unfortunately this approach may miss some of the dependencies because it prunes non-package directories, for example when dependencies relying on compiling C or C++ code. While this approach does worth a try, it does not work in all situations.
Dependency tarballs
Using the
go mod download
command instead guarantees to download all modules into the module cache directory, which then can serve as the source for a compressed dependency tarball.
Whatâs wrong with prepackaged dependencies?
- lacks deduplication of dependencies shared between packaged projects, since each packaged project includes a copy in its own dependency tarball
- pushes hosting responsibilities downstream, which may pose a challenge, especially for external contributors like proxied maintainers and GURU contributors
- tarballs may grow large, even hundreds of megabytes, complicating hosting challenges further through increased storage and traffic requirements
- changes the security trust model to include not only the upstream developers of the software and its dependencies, but also whoever creates and hosts these tarballs
I consider the last point the most important, because that stands out as a security issue, rather than resource usage overhead. I find it worthwhile to take a closer look at that.
Direct end users of a software need to trust the upstream developers, including the creators of its dependencies. Sounds fair.
Consumers of the same software via the package managers of their operating system need to also trust the package maintainers. Sounds acceptable.
Users of packages maintained by external contributors have to also trust the external maintainers and their chosen hosting solution. Sounds risky in any security-conscious environment.
How can we do better?
At this point I have two approach in mind which would potentially solve the security and hosting challenges I encountered while packaging.
Include the dependency directory in upstreamâs normal release tarballs.
In this case upstream does not have to pollute its source repository with all the dependencies, while still providing an official set of dependencies as part of their normal releases. I expect this would improve reproducibility for all downstream users.
Include the dependencies in a separate tarball as an extra artifact in upstreamâs normal releases.
As a slight variation of the previous approach, this would enable separating the canonical sources from the dependencies, by providing a separate release artifact. I expect this allows downstream users to choose their preferred way of getting dependencies.
In both cases the hosting challenge gets addressed where hosting already happens, which streamlines distribution for the project. More importantly, end users would get everything they need to build the project from the already most trusted source: the canonical upstream.
Most upstream projects have a well-documented, or even automated release process, which appears as a great candidate for contributions in your favorite project.
Do you maintain a Go project where this sounds useful and compatible? Letâs collaborate on it as a proof of concept experiment! I can even offer attempting to package your project for Gentoo, if we can try out one of the proposed approaches in the official releases.
Please reach out on one of my contact options if you consider this interesting.