Every infrastructure project starts the same way: build something that works for this deployment, this environment, this customer. The pressure is always to deliver now, solve the immediate problem, move on.
The problem comes later. The next deployment needs something slightly different. The one after that needs something different again. Without deliberate effort, you end up maintaining multiple bespoke systems that share lineage but diverge in practice. Each carries its own quirks, its own undocumented decisions, and its own operational burden.
This post covers how we approach infrastructure as a reusable product - not a one-off project - and the design decisions that make Lattice adaptable across environments without forking into unmaintainable variants.
Infrastructure as a Product
The shift from "infrastructure as a project" to "infrastructure as a product" changes how you think about what you're building.
A project delivers once. It has a scope, a timeline, and an end state. The infrastructure works for its intended purpose, and success is measured by whether it meets the requirements.
A product evolves. It serves multiple consumers, adapts to different contexts, and improves over time based on feedback. Success is measured by how well it serves all its users, not just the first one.
Lattice is infrastructure as a product. It's not a deployment script for a specific cluster - it's a platform framework that produces consistent, tested, secure Kubernetes environments across different hardware, network configurations, and operational requirements.
This distinction drives design decisions at every level.
The Parameterisation Problem
The most obvious challenge in reusable infrastructure is variability. Every deployment differs:
- Different hardware specs and node counts
- Different network topologies and address ranges
- Different storage requirements and disk layouts
- Different security postures and compliance standards
- Different component selections (mesh or no mesh, which monitoring thresholds)
The naive approach is configuration files with hundreds of variables. Set everything, deploy, hope the combinations work. This creates a different problem - combinatorial complexity. Testing every possible combination of variables isn't feasible, and untested combinations will eventually break.
Lattice handles this through layered configuration with sensible defaults:
Sane defaults - Every variable has a default that produces a working, secure deployment. A minimal configuration (just inventory - which nodes, which roles) produces a functional cluster.
Explicit overrides - Where you need to diverge from defaults, overrides are explicit and documented. You know what you changed and why.
Component toggles - Major components like Istio and Kiali are enabled or disabled as units, not configured piecemeal. This keeps the number of meaningful combinations manageable.
Validated combinations - The test suite validates the deployment as configured, catching incompatible settings before they cause runtime failures.
The goal is minimal configuration for common cases and explicit configuration for specific needs, with testing that covers what's actually deployed.
Separation of Concerns
Reusable infrastructure requires clear boundaries between layers:
Platform Layer
The platform layer is Lattice's core - Kubernetes itself, storage, networking, observability, and security hardening. This layer is consistent across deployments. Whether you're running three nodes or thirty, the platform behaves the same way.
Changes at this layer affect every deployment. They're tested thoroughly and released deliberately. This is the product.
Configuration Layer
The configuration layer adapts the platform to a specific environment. Node inventory, network ranges, storage allocation, component selection. This layer is different for every deployment, but the structure is consistent.
Ansible's inventory and variable system handles this naturally. The same playbooks operate on different inventories to produce different but consistently structured environments.
Application Layer
The application layer is what runs on the platform. Lattice doesn't manage applications - it provides the platform they run on. This separation is important. Platform changes shouldn't require application changes, and application deployments shouldn't require platform modifications.
The interface between layers is Kubernetes itself - standard APIs, standard resource types, standard patterns. Applications don't need to know they're running on Lattice rather than any other Kubernetes distribution.
Policy Layer
Security policies, network policies, resource quotas, and access controls. These vary by environment and by organisation, but the mechanisms are consistent. Lattice provides the framework and sensible defaults; operators tune policies to their requirements.
This separation matters because different stakeholders own different layers. Platform engineers own the platform. Operations teams own configuration. Application teams own their workloads. Security teams influence policy. Clear boundaries prevent teams from stepping on each other.
Version Control Everything
This sounds obvious. It isn't always practiced.
Reusable infrastructure requires that everything - playbooks, roles, configuration templates, test suites, documentation - lives in version control. Not just the code, but the decisions.
Tagged releases - Each version of Lattice is tagged. Deployments reference a specific tag. You always know which version of the platform is running in each environment.
Change history - Every modification is a commit with context. When something breaks after a change, the history shows exactly what changed.
Branch strategy - Feature development happens in branches. Releases are cut from a stable branch. Environments can pin to specific versions while development continues.
Configuration as code - Environment-specific configuration lives in version control alongside the platform code. Not in someone's head, not on a wiki, not in a shared drive.
This discipline makes rollbacks possible, audits straightforward, and collaboration manageable.
Testing at Every Level
Part 6 covered infrastructure testing in detail, but it's worth reinforcing in the context of reusability.
When infrastructure serves multiple deployments, a change that fixes one environment can break another. Testing must cover not just "does this work" but "does this still work everywhere."
Unit-level testing - Ansible roles are tested individually. Each role can be validated in isolation before being combined into a full deployment.
Integration testing - Changes are tested against the existing test suite to ensure they don't break what already works.ether.
Regression testing — Changes are tested against the existing test suite to ensure they don't break what already works.
Environment-specific testing - Each deployment configuration is tested in its context. Default configurations are tested, but so are common override patterns.
Automated testing makes this sustainable. Manual testing of every combination for every change doesn't scale. Automated test suites that run on every change do.
Documentation as Part of the Product
Documentation is often treated as an afterthought; something written after the work is done, if time permits. For reusable infrastructure, documentation is part of the product.
Architecture documentation - Why decisions were made, not just what was decided. When someone needs to modify the platform for a new requirement, understanding the reasoning behind existing decisions prevents accidental breakage.
Operational documentation - How to deploy, configure, test, upgrade, and troubleshoot. The playbooks from Part 9 are operational documentation.
Configuration reference - What variables exist, what they control, what their defaults are, and what valid values look like.
Change documentation - What changed between versions, why it changed, and what operators need to do (if anything) when upgrading.
Documentation has the same freshness problem as any other artifact - it drifts from reality over time. Tying documentation to the release process helps. If a change doesn't include documentation updates, the release isn't complete.
Handling Divergence
Despite best efforts, deployments diverge. A customer needs a specific feature. An environment has a constraint that requires a workaround. A security requirement demands a non-standard configuration.
How you handle divergence determines whether the product stays maintainable:
Upstream first - If a change could benefit other deployments, it goes into the core product. Feature flags or component toggles make it optional without forking.
Configuration over code - If a requirement can be met through configuration, it should be. Adding a variable is better than adding a branch.
Escape hatches - For genuinely unique requirements, Lattice provides extension points. Additional Ansible roles can be added without modifying core roles. Additional Kubernetes resources can be deployed without modifying the platform.
Resist forking - Every fork is a maintenance commitment. Two forks means maintaining two products. Three forks is unsustainable for a small team. If a change can't go upstream and can't be handled through configuration, challenge whether it's really necessary.
The economics are straightforward: maintaining one product that serves ten environments is less work than maintaining ten bespoke systems.
Upgrade Paths
Reusable infrastructure must be upgradeable. Deployments that can't be upgraded become legacy systems the moment they're deployed.
Semantic versioning - Breaking changes are signalled through version numbers. Operators know when an upgrade requires attention.
Migration guides - When breaking changes are necessary, migration guides explain what needs to change and why.
Backward compatibility - Where possible, new features are additive. Existing configurations continue to work without modification.
Staged rollout - Upgrades are tested in staging before production. In environments with multiple deployments, upgrades can roll through progressively rather than all at once.
Rollback capability - If an upgrade causes problems, rolling back to the previous version must be possible and tested.
The upgrade story is often what separates usable infrastructure from throwaway infrastructure. If upgrading is painful, people stop doing it, and the platform stagnates.
Source-Available as a Strategy
Lattice's source code is available to our customers. This is a deliberate choice, not just a licensing decision.
Transparency - Customers can see exactly what they're deploying. In secure environments, this matters. Black-box infrastructure that can't be inspected doesn't meet the security and assurance requirements of the sectors we work in. When a customer asks "what does this platform actually do?" the answer is "read the code."
Trust - In government, health, and defence contexts, infrastructure that can be audited builds trust in ways that proprietary solutions can't. Customers aren't taking our word for how the platform is configured — they can verify it themselves.
Collaboration - Customers who can see the source can raise meaningful bug reports, suggest improvements, and contribute back. This feedback loop improves the product faster than internal development alone.
Longevity - Customers aren't locked into a dependency on DNG for continued operation. The source is theirs to inspect, understand, and operate. This reduces vendor risk - a genuine concern for organisations making long-term infrastructure commitments.
The platform source is open to customers; it isn't a public repository. Specific deployment configurations, customer-specific adaptations, and operational procedures remain private to each engagement. The distinction matters - we're providing transparency and auditability to the people who need it, not publishing infrastructure patterns for the world to replicate.
Measuring Success
How do you know if your infrastructure-as-a-product approach is working?
Time to deploy - How long does it take to go from bare hardware to a running, tested, production-ready cluster? This should decrease over time as the product matures.
Configuration s - How many environment-specific modifications exist outside the core product? High divergence suggests the product isn't meeting needs.t or the product isn't flexible enough.
Divergence rate — How many environment-specific modifications exist outside the core product? High divergence suggests the product isn't meeting needs.
Upgrade frequency - Are platform-level incidents decreasing over time? A maturing product should produce fewer surprises.l.
Incident frequency — Are platform-level incidents decreasing over time? A maturing product should produce fewer surprises.
Onboarding time - How long does it take a new team member to deploy and operate the platform? This measures documentation quality and product clarity.
Lessons Learned
Start with reusability in - Good defaults mean most deployments need minimal configuration. Bad defaults mean every deployment is a customisation exercise.yment is the only deployment for a while, building for reuse pays off when the second deployment arrives.
Defaults matter enormously — Good defaults mean most deployments need minimal configuration. Bad defaults mean every deployment is a customisation exercise.
Resist complexity - Every feature, every variable, every option adds maintenance burden. Ask whether a capability justifies its cost before adding it.
Invest in testing - Automated testing is the foundation that makes everything else possible. Without it, changes are risky, upgrades are scary, and reuse is fragile.
Treat documentation as code - Version it, review it, test it (by following it), and maintain it. Stale documentation is actively harmful.
Listen to operators - The people deploying and operating the platform know where it's painful. Their feedback drives the most valuable improvements.
Wrapping Up the Series
Over these ten posts, we've covered the full lifecycle of building and operating a production Kubernetes platform for secure environments:
- Introduction - Why we built Lattice and the problems it solves
- Air-Gapped Deployment - Operating without internet access
- Observability - Monitoring without phoning home
- Security Hardening - Going beyond defaults
- Landscape - Where Lattice fits among alternatives
- Testing - Validating infrastructure works, not just deploys
- Storage - Block and object storage for resilient data management
- Service Mesh - When Istio adds value and when it doesn't
- Operational Playbooks - Preparing for failure
- Reusable Infrastructure - Building a product, not a project
The common thread throughout: infrastructure for secure environments demands more rigour, more planning, and more testing than typical deployments. But the investment pays off in reliability, security, and the ability to deliver consistent platforms repeatedly.
Lattice is our answer to this challenge. It's not the only answer, and it won't be the right answer for every situation. But for organisations that need production-ready Kubernetes in environments where the usual cloud-native assumptions don't apply, it provides a solid, tested, and open foundation.
Lattice is an open-source project developed by Digital Native Group. You can find it on GitHub.