Back to Lab Notes

Air-Gapped Kubernetes: Deploying When the Internet Isn't an Option

airgap

The first time you try to deploy Kubernetes in an air-gapped environment, you quickly discover how much the ecosystem assumes constant internet connectivity. Image pulls fail. Helm charts can't fetch dependencies. Package managers complain about unreachable repositories. Everything that "just works" in a connected environment suddenly doesn't.

This post covers the strategies we've developed for deploying and operating Kubernetes clusters when network isolation isn't just a preference it's a requirement.

Understanding the Air-Gap

Before diving into solutions, it's worth understanding what we're actually dealing with. An air-gapped environment typically means:

  • No outbound internet access - The cluster cannot reach Docker Hub, GitHub, or any external package repository.
  • Controlled data transfer - Getting content into the environment involves formal processes, often with security review and approval.
  • No real-time updates - You can't pull the latest patch on demand; everything must be pre-staged.

The strictness varies. Some environments allow one-way data transfer through diodes. Others require physical media. Some have isolated networks that can reach internal repositories but nothing external. The deployment strategy needs to accommodate whatever constraints exist.

The Image Problem

Container images are the obvious challenge. A typical Kubernetes deployment pulls dozens of images from multiple registries Docker Hub, quay.io, gcr.io, vendor-specific registries. Each component in the stack brings its own dependencies.

We've found three approaches that work, often used in combination:

Pre-loaded Images

K3s supports pre-loading images from tarballs placed in a specific directory. During installation, it imports these images directly into containerd without needing a registry at all.

The workflow looks like:

  1. On a connected system, pull all required images
  2. Export them to tarballs using docker save or ctr images export
  3. Transfer the tarballs to the air-gapped environment
  4. Place them where K3s expects them before installation

This approach works well for the core platform components—K3s itself, the CNI plugin, CoreDNS, and similar infrastructure. The images are known in advance and don't change frequently.

The limitation is scale. When you're deploying applications with many images that update regularly, managing individual tarballs becomes unwieldy.

Private Registry Mirror

For larger deployments, running a private container registry inside the air-gapped environment makes more sense. The registry acts as a mirror—images are pushed to it once, then pulled by nodes as needed.

The workflow shifts to:

  1. Export images on the connected side
  2. Transfer to the isolated environment (however that's permitted)
  3. Push images to the internal registry
  4. Configure containerd/K3s to pull from the internal registry

This adds infrastructure complexity (the registry needs storage, backups, and potentially high availability), but dramatically simplifies ongoing operations. Adding a new application means pushing its images to the registry, not distributing tarballs to every node.

Registry options range from minimal (Docker's own registry image) to full-featured (Harbor, with vulnerability scanning and replication). The right choice depends on scale, compliance requirements, and operational maturity.

Embedded Images

Some distributions, K3s included, can embed images directly into the installation artifact. The K3s binary ships with core component images baked in, meaning a fresh install on an isolated node can bootstrap without pulling anything.

This is elegant for the base platform but doesn't extend to applications. Still, it reduces the bootstrapping problem significantly - you only need to solve image distribution for your workloads, not for Kubernetes itself.

Beyond Images: The Full Dependency Graph

Images get the most attention, but air-gapped deployment touches everything:

Helm Charts - Charts often declare dependencies on other charts, which Helm tries to fetch at install time. You need to either vendor all dependencies into a local chart repository or use helm pull and helm dependency build on the connected side to create self-contained archives.

Operating System Packages - Installing prerequisites (container runtime dependencies, troubleshooting tools, security agents) requires either pre-installed golden images or a local package mirror. We typically build VM or machine images with everything pre-installed rather than trying to maintain internal yum/apt mirrors.

Ansible Collections and Roles - If your deployment tooling pulls dependencies at runtime, those pulls will fail. Pin versions and vendor dependencies into your automation repository.

TLS Certificates - Certificate renewal typically assumes network access to ACME providers. In isolated environments, you're either using internal PKI or manually managing certificate lifecycle.

Time Synchronisation - NTP servers outside the network boundary aren't reachable. You need internal time sources, and they need to be accurate; certificate validation, distributed systems, and logging all depend on synchronised clocks.

The Staging Environment

We've learned to treat the staging environment as a critical piece of infrastructure, not just a testing convenience. It serves as the bridge between connected and isolated worlds.

The pattern:

  1. Connected staging mirrors the air-gapped environment's configuration but has internet access
  2. Deploy and test there first, identifying all dependencies
  3. Export everything that was pulled during deployment
  4. Transfer the complete bundle to the isolated environment
  5. Deploy using only the pre-staged content

This catches dependency surprises before they become blocked deployments. If staging successfully deploys without reaching the internet, the air-gapped deployment will work.

Automation helps here. Scripts that capture "everything that was fetched" during a staging deployment and package it for transfer reduce manual errors and ensure nothing is missed.

Versioning and Reproducibility

Air-gapped environments amplify the importance of version pinning. When you can't fetch "latest" on demand, you need to know exactly what versions comprise a working deployment.

This means:

  • Explicit image tags - Never use latest or floating tags. Pin to digests if the tag might be re-pushed.
  • Locked Helm chart versions - Including transitive dependencies
  • Documented package versions - For everything installed on base images
  • Archived deployment artifacts - The exact bundle that deployed successfully, stored for potential rollback

When something breaks six months later, you need to be able to recreate the previous state exactly. In connected environments, you might get away with "just redeploy and it pulls the right stuff." In air-gapped environments, you need to have kept the right stuff.

Update Strategies

Keeping an air-gapped cluster current requires planning. Security patches, bug fixes, and new features all need to cross the air gap somehow.

Approaches vary by organisation:

Scheduled update windows - Collect updates over time, test them in staging, then transfer and deploy during planned maintenance. This batches the overhead but means you're always somewhat behind.

Continuous mirroring - For environments with data diodes or one-way transfer mechanisms, automation can push updates through as they become available. The air-gapped side receives a stream of content without initiating outbound connections.

Manual transfers - Sometimes the process really is "put it on approved media, walk it to the other network." This works but doesn't scale. Prioritise what actually needs updating.

Whatever the mechanism, the key is having it defined and practiced before you need it urgently. Discovering your update process doesn't work during a security incident is not ideal.

Testing Air-Gap Readiness

How do you know your deployment actually works without internet access? Firewall rules that block outbound traffic are the obvious answer, but they're not always reliable—some connections might still succeed through proxies or allowed paths.

We test air-gap readiness by:

  • Deploying to VMs with no default route (can't reach anything outside the local network)
  • Running deployment with network namespace isolation
  • Monitoring for any connection attempts to external addresses
  • Validating that the cluster functions after deployment, not just that it installed

The goal is confidence that when the bundle ships to the actual air-gapped environment, it will work. Surprises at deployment time are expensive when the feedback loop involves security reviews and controlled data transfers.

Lessons Learned

Start air-gapped early - Retrofitting air-gap capability onto a deployment designed for connected environments is painful. If you know the target is isolated, design for it from day one.

Automation is essential - Manual processes for exporting, transferring, and importing content don't scale and introduce errors. Invest in tooling that makes the process repeatable.

Document everything - What images are needed? What versions? What's the transfer process? Who approves it? When things go wrong at 2 AM, documentation is the difference between recovery and chaos.

Plan for the unexpected - What happens when a critical security patch needs to deploy urgently? Having tested the fast-path update process before you need it matters.

What's Next

The next post in this series will cover observability in isolated environments; how to monitor, alert, and debug when your cluster can't phone home to cloud dashboards or external logging services.


Lattice is a project developed by Digital Native Group.