When managing clusters and resources using FluxCD and GitOps, the first challenge any developer will meet is how to structure your GitOps repository.
FluxCD provides docs and a repository with good practices, examples, and thoughts on how to organize your YAML files to set up a multi-tenant Kubernetes cluster. However, with more complex environments where you manage multiple clusters, it is not so clear how to structure and organize your files.
So, what do you do? You search the internet for other people that had the same issues as you. Luckily, you don’t need to search much, because in the previous example repo, you will find fellow developers still looking for the same answers you are looking for.
“Which issues?” you might ask.
The challenge with FluxCD and GitOps is that you can easily end with lots of duplicated configurations making you feel that your code is not DRY at all, especially when trying to keep the configuration as flexible as possible. The culprits are the constraints that Flux and Kustomize enforce in the file structure.
This growth of files to maintain and groom will scale with the amount of apps and clusters you maintain. And this might eventually affect the maintenance and operability of the repo, making your devs slower and less happy.
As a DevOps, my mantra is to make my devs happy. A happy dev means less requests for help to the Platform team, and thus, less work for me (yes, I’m a pretty selfish person 🤓).
In this first post, I’ll propose and explain a repo structure aimed to be as extensible as possible, taking advantage of Kustomization Components to keep our configuration simple, maintainable, and your devs happy.
I’m currently using this setup in production in my homeLab, and you can find a copy of it (without personal configuration and credentials to hack my house) here:
All the code of this post can be found in this repo: https://github.com/Sturgelose/flux-structure-example
Glossary
- Cluster: A set of Kubernetes nodes that run containerized applications
- Operator: Engineer(s) that manage and own the cluster and platform running in it.
- Platform: A set of services that are installed at the cluster level and provide common features to any app running in it. Different environments might use different platform services.
- Platform service: A single service of the platform. Examples: ingress-nginx, external-dns, KEDA, etc.
- App: A deployment of a version using the platform and cluster as a dependency. It usually runs in its own namespace and is not a direct dependency on any other app.
- Tenant: Engineer(s) that manage and own an app.
- Base: Common/default resources and configurations of any installation of a unit (be it an App, Cluster, Platform Service, etc.)
- Variant: Extension of a base with extra features. Example: ingress-nginx + private ALB.
The Layers of a Standard GitOps Repo
First of all, let’s do an overview of a classic GitOps repo, trying to manage multiple clusters, though, in this post, we will focus on the platform and the clusters structures.
We can easily find docs and samples provided by the FluxCD community, but we could summarize it in the following:
├── apps │ ├── base │ ├── production │ └── staging ├── platform │ ├── base │ ├── production │ └── staging └── clusters ├── production └── staging
- Apps are installed by tenants (usually developer teams) in a specific cluster.
- The Platform is a set of apps and configs installed in all clusters, and that allow operators to manage the cluster or provide features to the apps. It provides some pre-set variants that clusters can reuse.
- Clusters install a variant of the infrastructure and a set of tenants with their apps.
This example gets short when trying to handle real-world situations.
And to keep it simple let’s set the following objectives:
- I want to provide some standardization so the Operators do not have too much complexity.
- I want to provide flexibility to Tenants when choosing which platform to run in.
- I want all this to be as automatic and DRY as possible, without having to configure each cluster individually
How Do I Provide Specific Configurations per Cluster?
There are many reasons why we need to configure each cluster with a unique configuration:
- Observability credentials that will be distinct per cluster
- IAM credentials so apps can interact with the cloud (especially the platform ones)
- I want to run a specific new version or configuration in that cluster
One clear example is with external-DNS as I need to provide a unique configuration for this cluster:
- The domains it will be listening
- Which IAM credentials to use to update the DNS zone
- Which version to use
All this data could be fetched and known before even creating the cluster, but how can we automate providing this information? Not everything is configurable from a HelmChart.
Sometimes, you need to provide Secret and ConfigMap resources in the very namespace where the service will be installed and pass it as a reference. But also, some of these values might be needed in other services (IAM might be needed in cert-manager to access the DNS zone as well).
If we want to handle this at cluster initialization time, we will also start thinking “In which namespace do these Secret and ConfigMap resources need to be created?” We do not know yet. Depending on the variant of the platform, we are installing some namespaces that might exist and others not!
How Can I Balance Standardization and Flexibility in the Platform?
Does this mean that I need a profile for each distinct unique configuration per cluster?
Here, we reach the compromise of standardization: we want to make all clusters as similar as possible, but still allow pre-configured customizations in the different clusters.
As an analogy, we want to provide a menu to our developers. They can choose out of a set of dishes, but they can choose the side dish and garnish for each of them. This trade-off makes our customers (developers) happy but avoids the platform team going crazy with different clusters with unique configurations.
This becomes a challenge once you realize how Kustomizations work, which we will tackle in the following question:
How Can I Keep the Code Structure DRY?
Kustomizations are mostly based on layering and creating different variants.
This means that for each unique combination, I need a Kustomization file that aggregates all the different services. You might already be imagining multiple folders with a single kustomization file aggregating that unique combination for a specific platform component.
If you have a base with 3 different features, you will end with 2^3 = 8
potential unique combinations. So, the number of kustomizations you will end will follow the binary combinations of 2^n where n is the number of features of this platform component.
You can find a full (and more clear) example of the previous problem here, written by the kustomize-sigs community.
All this is better expressed in this KEP:
The problem is that modular applications cannot always be expressed in a tall hierarchy while preserving all combinations of available features. Doing so would require putting each feature in an overlay, and making overlays for independent features inherit from each other.
However, this is semantically incorrect, cannot not scale as the number of features grows, and soon results in duplicate manifests and kustomizations.
Instead, such applications are much better expressed as a collection of components, i.e., reusable pieces of configuration logic that are defined in a common place and that distinct overlays can then mix-and-match. This approach abides by the DRY principle and increases ease of maintenance.
So, do we need to choose between flexibility and a simple structure? Not if we use Kustomize Components.
Kustomize Components: Ingredients for Your Cluster Recipe
Kustomize components provide a more flexible way to enable/disable features and configurations for applications directly from the kustomization file. This results in more readable, concise and intuitive overlays.
Source
Keeping the analogy of food menus, Kustomize Components allows us to extend a base (the main dish) with side-dishes and garnish at the will of the final customer.
How are they used?
# my-service/_base/kustomization.yaml # Declaring the base apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - deployment.yaml - configmap.yaml --- # my-service/feature1/kustomization.yaml # Declaring the Component apiVersion: kustomize.config.k8s.io/v1alpha1 kind: Component # We just define a Kustomization as a Component resources: - resource1.yaml - resource2.yaml patchesStrategicMerge: - configmap.yaml --- # my-service-instance/kustomization.yaml # Usage of the base and the component together apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - ../my-service/_base components: - ../my-service/feature1 - ../my-service/feature2
As you can see, it is as easy as declaring the base and components and then making use of them in the instance where we use them. However, we can do even better!
FluxCD Kustomization resources do support Components as well, so we can do the instantiation using directly the Flux Kustomization resource:
apiVersion: kustomize.toolkit.fluxcd.io/v1 kind: Kustomization metadata: name: podinfo namespace: flux-system spec: path: "./my-service/_base" components: - ../feature1 - ../feature2
It is important to note that the components’ paths must be local and relative to the path specified by .spec.path, whereas in the Kustomize example, it is relative to the kustomization.yaml file’s location.
Templating Our Flux Kustomizations
Even if we use Components, we will want to set specific values to the Flux Kustomizations. To do so, we can make use of the Post Build Variable Substitution that Flux Kustomize provides.
With it, we can define a resource in any Kustomization with variables to be replaced:
--- apiVersion: v1 kind: Namespace metadata: name: apps labels: environment: ${cluster_env:=dev} region: "${cluster_region}" --- apiVersion: v1 kind: Secret metadata: name: secret namespace: apps type: Opaque stringData: token: ${token}
And then, replace these values with Flux either from static values or from ConfigMaps and Secrets:
apiVersion: kustomize.toolkit.fluxcd.io/v1 kind: Kustomization metadata: name: apps spec: interval: 5m path: "./apps/" postBuild: substitute: cluster_env: "prod" cluster_region: "eu-central-1" substituteFrom: - kind: ConfigMap name: cluster-vars # Use this ConfigMap if it exists, but proceed if it doesn't. optional: true - kind: Secret name: cluster-secret-vars # Fail if this Secret does not exist. --- apiVersion: v1 kind: Secret metadata: name: cluster-secret-vars namespace: flux-system type: Opaque stringData: token: SUPERSECRETTOKEN
This would then render into:
--- apiVersion: v1 kind: Namespace metadata: name: apps labels: environment: "prod region: "eu-central-1" --- apiVersion: v1 kind: Secret metadata: name: secret namespace: apps type: Opaque stringData: token: SUPERSECRETTOKEN
This simple feature can enable us to inject cluster variables into the different Flux Kustomizations avoiding patches and thus more human readable.
Note that this feature is provided by Flux, and it is not supported by Kustomize itself. There are many requests to support so, but they have always been declined arguing that there are other tools that do it better, and they want to avoid including this logic in Kustomize as Kustomize aims to be as declarative as possible.
Our Structure and Binding With FluxCD Resources
Now, let’s try to bring all this together. This will be our project’s structure:
├── clusters │ ├── _profiles # Store all the different profiles │ │ ├── _base # Base for all cluster profiles (things installed in all variants) │ │ ├── home │ │ └── prod │ ├── home-cluster-raspi # A cluster instance │ │ ├── flux-system # Generated by flux bootstrap │ │ └── platform │ │ ├── kustomization.yaml # Maps to a profile and injects secrets/config in the cluster │ │ ├── cluster-secrets.yaml │ │ └── cluster-config.yaml │ ├── azure-cluster-aks │ └── ... └── platform # Contains all the platform services ├── grafana-operator │ └── _base ├── grafana-agent ├── cert-manager ├── datadog-operator ├── datadog-agent ├── ingress-nginx │ ├── _base # Base implementation of this service │ └── nodeport # Feature to expose nginx in a NodePort instead of in a LoadBalancer ├── local-path-provisioner └── ...
Let’s implement, for example, the DataDog stack in a cluster. It will require the API and APP key secrets.
Creating the Platform Service
First, let’s declare two platform services, one that installs the operator and CRDs:
# platform/datadog-operator/_base/helm.yaml apiVersion: source.toolkit.fluxcd.io/v1beta1 kind: HelmRepository metadata: name: datadog namespace: flux-system spec: interval: 4h url: https://helm.datadoghq.com --- apiVersion: helm.toolkit.fluxcd.io/v2beta1 kind: HelmRelease metadata: name: datadog-operator namespace: flux-system spec: targetNamespace: ${namespace_name:=default} serviceAccountName: kustomize-controller chart: spec: chart: datadog-operator interval: 15m sourceRef: kind: HelmRepository name: datadog version: '1.2.1' interval: 15m values: apiKeyExistingSecret: datadog-secret appKeyExistingSecret: datadog-secret site: datadoghq.eu --- # platform/datadog-operator/_base/secret.yaml apiVersion: v1 kind: Secret metadata: name: datadog-secret namespace: ${namespace_name:=default} type: Opaque stringData: api-key: ${datadog_api_key} app-key: ${datadog_app_key} --- # platform/datadog-operator/_base/namespace.yaml apiVersion: v1 kind: Namespace metadata: name: ${namespace_name:=default} labels: owner: ${namespace_owner:=platform} --- # platform/datadog-operator/_base/kustomization.yaml apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - helm.yaml - secret.yaml - namespace.yaml
… and another that uses the CRs to declare agents and is installed in the same namespace. This one will have a feature to enable APM tracing in only certain cluster profiles as it will be disabled by default.
# platform/datadog-agent/_base/datadogagent.yaml apiVersion: datadoghq.com/v2alpha1 kind: DatadogAgent metadata: name: datadog namespace: ${namespace_name:=default} spec: global: site: datadoghq.eu credentials: apiSecret: secretName: datadog-secret keyName: api-key appSecret: secretName: datadog-secret keyName: app-key features: apm: enabled: false clusterChecks: enabled: true kubeStateMetricsCore: enabled: true logCollection: containerCollectAll: false enabled: false liveContainerCollection: enabled: false liveProcessCollection: enabled: false --- # platform/datadog-agent/_base/kustomization.yaml apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - datadogagent.yaml --- # platform/datadog-agent/apm/kustomization.yaml apiVersion: kustomize.config.k8s.io/v1alpha1 kind: Component patchesStrategicMerge: - datadogagent-patch.yaml --- # platform/datadog-agent/apm/datadogagent-patch.yaml apiVersion: datadoghq.com/v2alpha1 kind: DatadogAgent metadata: name: datadog namespace: ${namespace_name:=default} spec: features: apm: true
Creating the Cluster Profile
Now, we have two platform services that we need to configure in order to provide the secrets.
So, let’s create the prod profile that will be using this. Note that the Flux Kustomizations that load the datadog service are expecting secrets and configs from two files cluster-secrets and cluster-configs. We will create them later on.
# clusters/_profiles/prod/datadog.yaml apiVersion: kustomize.toolkit.fluxcd.io/v1beta2 kind: Kustomization metadata: name: datadog-operator namespace: flux-system spec: interval: 15m sourceRef: kind: GitRepository name: flux-system serviceAccountName: kustomize-controller path: ./platform/datadog-operator/_base prune: true wait: true timeout: 5m postBuild: substitute: namespace_name: datadog substituteFrom: - kind: ConfigMap name: platform-namespace-vars optional: true # Use this ConfigMap if it exists, but proceed if it doesn't. - kind: Secret name: cluster-secrets --- apiVersion: kustomize.toolkit.fluxcd.io/v1beta2 kind: Kustomization metadata: name: datadog-agent namespace: flux-system spec: components: - ../apm # Here we set to use the APM feature for this instance dependsOn: - name: datadog-operator interval: 15m sourceRef: kind: GitRepository name: flux-system serviceAccountName: kustomize-controller path: ./platform/datadog-agent/_base prune: true postBuild: substitute: namespace_name: datadog substituteFrom: - kind: ConfigMap name: platform-namespace-vars optional: true # Use this ConfigMap if it exists, but proceed if it doesn't. --- # clusters/_profiles/prod/kustomization.yaml apiVersion: kustomize.config.k8s.io/v1alpha1 kind: Component resources: # This is the base profile where we can set services to be installed anywhere, loaded as an overlay - ../_base - datadog.yaml
Create the Cluster
Now, we just need to make use of the prod profile in our example-prod cluster.
FluxCD bootstrapping will create the folder of the cluster and a flux-system folder inside of it, with all the definitions of Flux. We will create some extra configurations to load our platform services.
# clusters/example-prod/kustomization.yaml apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization namespace: flux-system components: - ../../_profiles/home resources: - cluster-secrets.yaml - cluster-config.yaml --- # clusters/example-prod/cluster-secrets.yaml kind: Secret apiVersion: v1 metadata: name: cluster-secrets stringData: datadog_api_key: SUPERSECRET datadog_app_key: SUPERSECRET # clusters/example-prod/cluster-configs.yaml kind: ConfigMap apiVersion: v1 metadata: name: cluster-config data: foo2: bar2
This is just an example. Never keep your secrets in a Git repository unencrypted. Make sure to use SOPS or a vault to keep your secrets encrypted!
So, now with all this, setting the right cluster secrets and configs, and just by defining the profile, we can bootstrap a cluster!
Inside the platform folder, you can also do patch overrides of the profile, allowing you to keep all the unique configurations of your cluster in one single folder, close together.
So, depending on how spread you want this configuration to to be, you can set it at different levels:
- Per cluster configuration → To override the profile
- Per profile configuration → To override the platform services and add components
- Per platform service configuration
If we want a new cluster, just bootstrap 3 more files (or more if you want to override something), and you are good to go!
But, can we do it better? Now, we have a standard way to define config, secrets, and profiles, kind of a contract to create a cluster which gives room for even more automatization.
So, why not inject all this when creating the cluster with IaaC, Terraform, or OpenTofu?
We can even create new API and APP keys for each individual cluster.
Integrating With Terraform/OpenTofu to Deploy to KinD
We can automatize this; so, with a single terraform/tofu apply
- Bootstrap a new KinD cluster
- Bootstrap Flux with the flux-tf provider
- Given the secrets, config, and profile name, create the right files in the GitHub repo
The following are just some pieces of all the code, but all the previous logic can be summarized in:
# File: module/flux-cluster/main.tf # Bootstrap flux resource "flux_bootstrap_git" "this" { path = "clusters/${var.cluster_name}" # Let's depend on the secrets and configs so at first boot flux them in-place # Otherwise, it takes 10min to reconcile depends_on = [ github_repository_file.config, github_repository_file.kustomization, github_repository_file.secrets, ] } # Custom profile management locals { flux_platform_path = "clusters/${var.cluster_name}/platform" } resource "github_repository_file" "kustomization" { repository = var.github_repository branch = "main" commit_message = "[Flux] Configure Kustomization for ${var.cluster_name}" overwrite_on_create = false file = "${local.flux_platform_path}/kustomization.yaml" content = templatefile( "${path.module}/templates/kustomization.sample.yaml", { profile_name = var.profile_name } ) } resource "github_repository_file" "secrets" { repository = var.github_repository branch = "main" commit_message = "[Flux] Configure cluster secrets for ${var.cluster_name}" overwrite_on_create = false file = "${local.flux_platform_path}/cluster-secrets.yaml" content = templatefile( "${path.module}/templates/secrets.sample.yaml", { # TODO Secrets should be stored encrypted with a provided SOPS key before committing data = [for key, val in var.cluster_secrets : "${key}: encrypted(${val})"] } ) } resource "github_repository_file" "config" { repository = var.github_repository branch = "main" commit_message = "[Flux] Configure cluster config for ${var.cluster_name}" overwrite_on_create = false file = "${local.flux_platform_path}/cluster-config.yaml" content = templatefile( "${path.module}/templates/config.sample.yaml", { data = [ for key, val in var.cluster_config : "${key}: ${val}"] } ) }
# File: module/flux-cluster/main.tf # Bootstrap flux resource "flux_bootstrap_git" "this" { path = "clusters/${var.cluster_name}" # Let's depend on the secrets and configs so at first boot flux them in-place # Otherwise, it takes 10min to reconcile depends_on = [ github_repository_file.config, github_repository_file.kustomization, github_repository_file.secrets, ] } # Custom profile management locals { flux_platform_path = "clusters/${var.cluster_name}/platform" } resource "github_repository_file" "kustomization" { repository = var.github_repository branch = "main" commit_message = "[Flux] Configure Kustomization for ${var.cluster_name}" overwrite_on_create = false file = "${local.flux_platform_path}/kustomization.yaml" content = templatefile( "${path.module}/templates/kustomization.sample.yaml", { profile_name = var.profile_name } ) } resource "github_repository_file" "secrets" { repository = var.github_repository branch = "main" commit_message = "[Flux] Configure cluster secrets for ${var.cluster_name}" overwrite_on_create = false file = "${local.flux_platform_path}/cluster-secrets.yaml" content = templatefile( "${path.module}/templates/secrets.sample.yaml", { # TODO Secrets should be stored encrypted with a provided SOPS key before committing data = [for key, val in var.cluster_secrets : "${key}: encrypted(${val})"] } ) } resource "github_repository_file" "config" { repository = var.github_repository branch = "main" commit_message = "[Flux] Configure cluster config for ${var.cluster_name}" overwrite_on_create = false file = "${local.flux_platform_path}/cluster-config.yaml" content = templatefile( "${path.module}/templates/config.sample.yaml", { data = [ for key, val in var.cluster_config : "${key}: ${val}"] } ) }
Simply do a terraform/tofu apply, and everything will be created!
kind_cluster.this: Creating... tls_private_key.flux: Creating... tls_private_key.flux: Creation complete after 0s [id=bd2ddfaa118a8f8419edbde196c2c17349d161e5] kind_cluster.this: Still creating... [10s elapsed] kind_cluster.this: Creation complete after 16s [id=my-cluster-] module.flux.github_repository_deploy_key.this: Creating... module.flux.github_repository_file.config: Creating... module.flux.github_repository_file.secrets: Creating... module.flux.github_repository_file.kustomization: Creating... module.flux.github_repository_deploy_key.this: Creation complete after 1s [id=flux-platform:90111711] module.flux.github_repository_file.kustomization: Creation complete after 9s [id=flux-platform/clusters/my-cluster/platform/kustomization.yaml] module.flux.github_repository_file.config: Creation complete after 9s [id=flux-platform/clusters/my-cluster/platform/cluster-config.yaml] module.flux.github_repository_file.secrets: Still creating... [10s elapsed] module.flux.github_repository_file.secrets: Creation complete after 10s [id=flux-platform/clusters/my-cluster/platform/cluster-secrets.yaml] module.flux.flux_bootstrap_git.this: Creating... module.flux.flux_bootstrap_git.this: Still creating... [10s elapsed] module.flux.flux_bootstrap_git.this: Still creating... [20s elapsed] module.flux.flux_bootstrap_git.this: Still creating... [30s elapsed] module.flux.flux_bootstrap_git.this: Still creating... [40s elapsed] module.flux.flux_bootstrap_git.this: Creation complete after 41s [id=flux-system]
You can find all the work (and more) of this post, in this repo: https://github.com/Sturgelose/flux-structure-example
Conclusions
We have learned how to simplify our structure while keeping it simple with Kustomize Components and Templating.
This current setup is flexible enough to be extended into many production environments while keeping an opinionated and DRY structure. For example, it could be added to the logic to create groups of clusters sharing the same base and still be potentially created through IaaC.
Or, in the current example, we are passing the secrets when they could be automatically created using the datadog provider.
This structure is just a base that can be as simple or complex as each platform team will need, but that clarifies and fixes lots of common issues that you will find with the default structure and logic that FluxCD docs suggest.
We haven’t touched yet all the different issues as there are some yet unsolved, especially on the multi-tenancy side, but I will try to tackle them in future posts.
Future (Potentially Upcoming) Posts
- Handling multi-tenancy at scale
- Chicken and egg issues when Flux depends on other charts
- CI in GitOps repos