Tech Veterans’ New Approach To Eliminate ‘Configuration Hell’

There’s gotta be a better way! As with so many startups, that was the idea behind ConfigHub, focused on delivering both dev and ops folks out of “configuration hell.” “Config hell is when you’re trying to fix the system, everything’s broken, everybody’s screaming, and you can’t even figure out which of the 1,000 YAML files contains the actual error,” said CEO Alexis Richardson in an interview. It’s one of the problems he and his cofounders have dealt with for years. Richardson was the founder of cloud native container management platform Weaveworks and message broker RabbitMQ. Former Google software engineer Brian Grant, ConfigHub’s CTO, was the original lead architect of Kubernetes; and Jesper Joergensen, the new company’s chief product officer, held various roles at Salesforce, including with Heroku, before joining Twilio to lead its voice, video and platform teams. “We knew there had to be something better than this. … What we have instead is people wading through pages and pages and pages and pages of YAML, Terraform, HCL [HashiCorp Configuration Language] … whatever it’s called. You know, new languages all the time for supposedly solving this problem, and it’s incomprehensible to most people. And when you have an outage, you don’t have time to learn it. You just got to fix it,” said Richardson. Their Solution: Treat Configuration As Data Misconfiguration perennially makes the OWASP Top 10 list of the most common security vulnerabilities. It is blamed for massive outages including recent ones at Cloudflare, Amazon Web Services and Azure, as well as the July 2024 Crowdstrike failure estimated to have cost airlines, banks and other companies $5.4 billion. These are problems that approaches like Infrastructure as Code (IaC), Infrastructure from Code (IfC) and others have aimed to fix. “Brian’s light bulb idea was just to take all the configuration data and separate it out … take the manifest, render it all out into values, no variables, no programming loops, no mysterious templates, nothing that has to be generated as part of the configuration process, and all of that then becomes operational facts about your business,” Richardson said. Grant explained in a recent KubeCon North America talk that during an outage, ops folks don’t want to wade through a maze of YAML files, git folders and joined-up dev tools, they want a single pane of glass they can break to find and fix the error. Yet GitOps wants a write-through on git before a fix can take place. To simplify things, the ConfigHub team created a database where configuration is stored and managed as structured data. It is serialized using standard data formats, such as YAML, and stored with revision history and some metadata. Since the data is maintained in a live state, it’s always ready to go. Code that operates on that configuration is separate from configuration data connected via API. “The configuration data is not parameterized. The configuration of every variant is stored independently in its native, fully rendered, WET [Write Every Time] form. There are no templates, variables, conditionals, loops or generators that create configuration on the fly. You also don’t need to write and maintain patches by hand the way you do with Kustomize. Every value specific to an environment is stored literally in the config…. Make a simple edit to the config and the live resources can be updated more quickly and with fewer, simpler steps than deploying through git and CI/CD pipelines,” Grant explained in a blog post. In the interview Joergensen explained: “If you look at Kubernetes, which we’re focused on right now, and you look at a typical best practice setup today, you’ll see teams … basically use Helm as first step of the generation, and then what it generates gets sent over to CI/CD. … But a lot of coding, a lot of processing, generation is happening inside the CI/CD step, and there is no visibility at this level,” he said. “The DevOps folks cannot see what gets deployed before it gets deployed, because it’s all being generated, and that is creating a bunch of problems. And so we instead say, ‘Start by whatever you want to do with software,’… but what you end up with is a database of what we call literal config, meaning that the config is fully rendered, fully generated, in the form that it needs to be in right before it goes to the infrastructure. And that is your source of truth. That is your system of record. “And that is what you’re looking at, both as humans and as automations and AI and whatever else you want to point at it, and that gives you a whole different level of fidelity about what will happen next. And that changes things a lot.” With one-to-one mapping, every running object and therefore every error can be located and updated, even in bulk, using standard database operations such as semantic analysis based on data schemas. It enables reverse or “bidirectional” GitOps, allowing users to see changes to be made before deployment and the results afterward. Since each config is isolated, changes affect only that one environment with no chance they will interfere with anything else. How ConfigHub Works: Core Components In a Kubernetes deployment, ingress hostnames, environment variables, image tags, service dependencies, resource requests and more are all stored directly in the YAML rather than being generated via template variables and input values, explained consultant and platform engineering guru Artem Lajko in a deep dive into IaC issues and ConfigHub’s take on it. “Make a simple edit to the config, and the live resources can be updated without running a complex CI/CD process,” he noted, adding “Policy enforcement can check the configuration directly and report results immediately, unlike templated config that first needs rendering.” The core components of ConfigHub are: A Unit maintains a sequential list and history of revisions of the config data. A Target is where the configuration is to be applied. It abstracts access details and credentials to Kubernetes clusters, cloud accounts and other services, meaning users can manage resources without directly handling credentials. A Worker is a separate process connecting ConfigHub and outside entities. Similar to a Kubernetes GitOps Operator or CI runner, it operates within your cluster. Its two main types of work are Functions and Bridges. Functions are executable pieces of code that operate on configuration data within Config Units. They can be read only, mutating or validating, and extend ConfigHub’s automation capabilities. Bridges connect ConfigHub with your destination resource into the correct API calls and relay operation events and status back to ConfigHub. Lajko reports that rather than being merely a database, ConfigHub allows teams to keep their existing workflows — and create automated ones — and provides a single source of truth and overall view of the environment they’re working in. The Real-World Impact of Misconfiguration At KubeCon, Erick Bourgeois, director and Kubernetes platform engineer lead at RBC Capital Markets, explained how one extra space in its massive tangle of config files crashed its systems. The company had YAML inside YAML inside YAML. A routine ConfigMap patch across around 30 Grafana instances turned into a three-day investigation because of unintended interactions in templated config. By treating configuration as data, you can take advantage of being able to no longer see it as text, but queryable content. Once all validation and mutation is complete, you have a resulting “unit” that can pipe through a worker to achieve your goal, Bourgeois said on LinkedIn. Richardson said in the presentation with Bourgeois: “That was a good example of what I’ve been calling ‘config sprawl,’ where you’ve got lots of different files in lots of different repos owned by different people, different templates which make different formats at different times connecting to lots of different systems. … Controlling this and managing it is getting harder and harder and harder. And if you look at the statistics, you’ll see people like the DORA folks who are now at Google will show evidence that the speed of DevOps just has not got any faster. “That wasn’t what we hoped would happen when nearly 10 years ago we and others came up with these new automated technologies around Kubernetes. We thought we would make things easier, more automated, safer, more compliant. And instead, people seem to be getting more and more stuck. And that’s partly because systems have grown. We’re the victims of our own success. ….” A New Way of Thinking About Configuration Management He urged the audience to try out its early stage Software as a Service (SaaS) technology, which is in the preview stage. “It makes all the configuration that you’re going to need for the relevant parts of managing infra and apps centralized and provides structure so that dependencies between application components can be seen and understood in relation to history, versioning and potentially policy as well. So you can apply things like triggers and compliance and functions around that. That will allow you to do really important things that are a little tricky right now, like seeing what happens before you deploy and then being able to check afterwards to see if that was correct,” he said. He stresses that it’s still very early days for the company. The trio began working on this company in 2024, and the Menlo Park, California-based startup emerged from stealth in March, announcing $4 million in funding. “We’ve been around this industry a long time. We know how long it takes to build things, and we don’t want to be kind of sounding like we have all the answers,” Richardson said in the interview. But the cofounders believe the time is right to try to spur change. “We think that there’s enough pent-up energy, people sitting on the sidelines, sort of having accepted the status quo, but having ideas and having perspectives that, if we can...

Tech Veterans’ New Approach To Eliminate ‘Configuration Hell’

Related Articles

Ongoing SoundCloud issue blocks VPN users with 403 server error

700Credit data breach impacts 5.8 million vehicle dealership customers

AWS Weekly Roundup: Amazon ECS, Amazon CloudWatch, Amazon Cognito and more (December 15, 2025)

How Nutanix Is Taming Operational Complexity

Flaw in Hacktivist Ransomware Lets Victims Decrypt Own Files

What Is Google’s Agent Development Kit? An Architectural Tour