Introduction
GitLab CI/CD can be made deterministic and reproducible by using version-controlled configuration files and consistent environments, similar to Nix. This involves defining your CI/CD pipelines in a
.gitlab-ci.ymlfile and using versioned Docker images to ensure consistent builds.
- the Docker image
- the GitLab runner
- the Nix daemon
- CI scripts
- binary caches
- deployment hosts
- network namespaces
- object storage backends
Each component may function correctly in isolation while the deployment system as a whole remains unstable.
This work focused on reducing that incoherence.
The goal was transforming a fragile GitLab deployment pipeline into a reproducible and coordinated deployment environment where:
Docker Image
↓
GitLab Runner
↓
Nix Build
↓
Binary Cache
↓
Deployment Runtime
operates as a single coherent execution system instead of loosely connected tooling.
The failures were rarely caused by one broken component.
Most failures emerged from mismatched assumptions between components executing in different contexts.
Architecture
Current deployment stack:
VM
→ Docker Image (`blog-ci`)
→ GitLab Runner
→ Nix Daemon
→ Nix Build
→ Attic Binary Cache
→ Cloudflare R2 Storage
→ Deployment Pipeline
The architectural transition was from:
mutable CI scripts
toward:
coherent infrastructure-defined deployment execution
where the runtime environment itself becomes reproducible.
The Core Problem
The deployment stack originally behaved like independent systems stitched together at runtime:
- Docker executed the container
- GitLab scheduled the job
- Nix executed the build
- CI scripts patched missing tooling
- Attic handled caching
- MinIO/R2 handled storage
Each layer carried different assumptions about:
- filesystem state
- network visibility
- available binaries
- credentials
- daemon lifecycle
- execution context
This produced recurring failures that appeared unrelated but shared the same root cause:
lack of execution coherence
The Drift Between Components
Docker Image vs Runtime Reality
The CI container originally depended on bind-mounted tooling from the host system.
This created recurring failures such as:
grep: command not foundgit: command not foundattic: command not found- broken shell environments
- overwritten container binaries
The container image itself was incomplete and depended on runtime patching.
GitLab Runner vs Nix Daemon
The GitLab runner and Nix daemon operated with incompatible assumptions about:
- socket availability
- namespace visibility
- filesystem mounts
- daemon ownership
- container permissions
This produced:
- daemon socket failures
- intermittent build instability
- partially broken Nix store state
CI Scripts vs Infrastructure State
The CI scripts assumed:
- credentials already existed
- caches were reachable
- substituters were configured
- runners preserved state
But the infrastructure executing those scripts was ephemeral.
This caused:
- cache authentication failures
- missing credentials
- inconsistent substituter behavior
- rebuild storms after cache misses
Most deployment instability came from assumptions leaking across execution boundaries.
Restoring Execution Coherence
The fix was not debugging individual failures indefinitely.
The fix was making the deployment layers coherent with each other.
Immutable CI Image
The first major shift was replacing runtime patching with a deterministic CI image.
packages.x86_64-linux.ci-image =
pkgs.dockerTools.buildLayeredImage {
name = "blog-ci";
tag = "latest";
};Instead of mounting binaries from the host into the container, the image itself declared the required runtime:
paths = with pkgs; [
nix
cacert
cachix
attic-client
git
bash
coreutils
gnugrep
gnumake
openssh
];This changed the role of the Docker image from:
empty execution shell
to:
declared deployment runtime
The CI container became self-contained and reproducible.
The most important improvement was not adding tools.
It was eliminating runtime uncertainty.
GitLab Runner Alignment
The runner configuration was simplified so the container and daemon shared compatible assumptions.
The volume configuration reduced to:
dockerVolumes = [
"/nix/store:/nix/store:ro"
"/nix/var/nix/daemon-socket:/nix/var/nix/daemon-socket"
"/var/run/docker.sock:/var/run/docker.sock"
];The direct /nix/var/nix/db mount was removed entirely.
Previously, the CI container accessed the host SQLite store database directly, creating lock contention and partially corrupted store state after interrupted jobs.
With:
NIX_REMOTE=daemon
all store operations became coordinated through the host daemon instead of independent filesystem access.
This restored coherence between:
- runner lifecycle
- daemon lifecycle
- store operations
- container execution
Binary Cache as Infrastructure
Caching stopped being treated as a deployment optimization.
It became part of the execution model itself.
The deployment system stabilized around:
- Cachix for distributed pull caching
- Attic for self-hosted cache control
- Cloudflare R2 for durable object storage
Attic + R2
The important shift was architectural.
Previously:
build → deploy
Now:
build → cache → distribute → deploy
The cache became part of deployment orchestration.
This reduced:
- rebuild frequency
- deployment time
- memory pressure
- CI instability
while improving reproducibility across runners.
Distributed binary caching reduced both deployment latency and operational drift.
MinIO / R2 Consistency Problems
A critical issue emerged from configuration divergence between users.
The same MinIO client behaved differently under:
- regular user execution
- root execution
- CI execution
This produced errors like:
Access Denied
and:
The AWS Access Key Id you provided does not exist
The issue was not object storage itself.
The issue was inconsistent credential state across execution environments.
The fix standardized:
- MinIO configuration
- credential propagation
- cache endpoints
- runtime secrets
across all execution contexts.
Network Namespace Drift
One of the least obvious failures involved network identity.
127.0.0.1 meant different things depending on where the process executed:
- host machine
- shell executor
- Docker container
- remote VM
The same cache endpoint worked in one environment and silently failed in another.
The real issue was not networking itself.
It was incoherent assumptions about execution locality.
localhost is not a universal address.
It is relative to the current execution namespace.
Deployment State vs Operational State
Nix captures declared infrastructure reproducibly.
What it cannot capture directly is operational runtime state:
- daemon restarts
- OOM-killed services
- dropped SSH sessions
- stale credentials
- network namespace boundaries
- interrupted store writes
This gap between:
declared state
and:
operational state
was where almost all debugging occurred.
The deployment stack became stable only after the execution environments themselves became coordinated.
The Three Infrastructure Gaps
Looking across all failures, they ultimately fell into three categories.
1. Build Definition vs Runtime Environment
The declared build environment differed from the environment actually executing the build.
2. Execution Context vs Network Identity
Services assumed shared locality across isolated namespaces.
3. Cache State vs Deployment State
Caches, credentials, and substituters behaved inconsistently across runners and users.
The fixes worked once those assumptions became explicit infrastructure definitions instead of runtime accidents.
Architecture Outcome
The final system is still not Kubernetes.
It is not Terraform.
It is not a fully autonomous cloud platform.
But the deployment stack now behaves as a coordinated execution system where:
- the Docker image defines the runtime
- the GitLab runner orchestrates deterministic execution
- Nix defines reproducible builds
- binary caches distribute build artifacts coherently
- R2 persists deployment state durably
- CI scripts operate against stable infrastructure assumptions
The important shift was not merely fixing CI failures.
The important shift was reducing incoherence between the systems executing the deployment.
Conclusion
The original deployment system relied on mutable runtime behavior:
CI scripts patch missing state at execution time
The stabilized system instead moves toward:
deployment infrastructure defines execution coherently
where:
- execution environments are reproducible
- caches are infrastructure primitives
- runners behave deterministically
- deployment assumptions are explicit
- runtime drift is minimized
The result is not simply faster deployments.
It is a deployment system whose components execute with shared assumptions instead of accidental compatibility.
Leave a comment
Comments are verified via IndieAuth. You will be redirected to authenticate before your comment is published.