My Articles
Technical insights and tutorials about DevOps, Kubernetes, AWS, and infrastructure automation practices.
CrowdSec WAF on Kubernetes
Needed a WAF for public APIs. Chose CrowdSec (open-source). Hit three integration issues: PostgreSQL namespace, client IP preservation, log collection. Documenting the fixes.
Reducing Docker Image Sizes by 70%
Our Docker images were 800MB+. Build times were slow, pulling images took forever. Spent a day optimizing - got images down to 200MB using multi-stage builds and Alpine base images.
Adding Nodes to Kubernetes Cluster When Traffic Grew
Traffic increased 40% over 3 months. Nodes were running at 75% CPU. Ordered 2 new Hetzner servers and added them to the cluster. Took about 4 hours from ordering to nodes serving traffic.
DNS Lookups Were Timing Out Randomly in Kubernetes
Applications occasionally failed DNS lookups with 5-second timeouts. Checked CoreDNS logs, CPU usage, network - everything looked fine. Turned out to be conntrack table exhaustion on worker nodes.
Automating SSH Key Rotation on Hetzner Servers
Security audit said our SSH keys hadn't been rotated in 18 months. Wrote a script to rotate keys across all Hetzner servers and update them in Azure Key Vault. Took 3 hours to build, runs in 5 minutes.
Enforcing Pod Security Standards Broke Half Our Deployments
Enabled Pod Security Standards in Kubernetes. Immediately broke 6 out of 12 applications because they were running as root or using privileged containers. Spent 2 days fixing them all.
Moving Terraform State from Local Files to Azure Storage
We'd been storing Terraform state in Git (bad idea). Moved it to Azure Blob Storage with state locking. Migration took 30 minutes. Should have done this from the start.
Debugging Random 502 Errors from NGINX Ingress
Users reported occasional 502 errors. Logs showed NGINX couldn't reach backend pods. Took a day to find the issue - pod readiness probes were too aggressive and marking healthy pods as not ready.
Adding Trivy Scans to Our CI Pipeline
Integrated Trivy into GitLab CI to scan container images for vulnerabilities before deployment. Found 47 high-severity issues we didn't know about. Some were fixable, some weren't.
Automating PostgreSQL Backups to Azure Blob Storage
Set up daily PostgreSQL backups from our Kubernetes cluster to Azure Blob Storage. Using pg_dump in a CronJob with lifecycle policies for retention. Cost is about €8/month for 30 days of backups.
external-secrets Wasn't Syncing from Azure Key Vault
Secrets in Azure Key Vault were updated but pods kept using old values. Took 2 hours to figure out the sync interval setting and force a refresh. Notes on how external-secrets actually works.
Hit Let's Encrypt Rate Limit While Testing cert-manager
Made a mistake while testing cert-manager configuration. Issued 20 certificates for the same domain in an hour. Got rate limited for a week. Notes on staging environment and rate limits.
Downsizing Hetzner Servers We Don't Need
Looked at actual CPU and memory usage across our Kubernetes nodes. Found we were paying for servers we barely used. Saved €120/month by switching to smaller machines.
Upgrading a Self-Managed Kubernetes Cluster Without Managed Services
Moving from Kubernetes 1.28 to 1.29 on bare metal Hetzner servers. No managed control plane to click 'upgrade' - we had to do it manually. Notes on what actually happened.
Hetzner Network Issues and Why We Keep Backups Elsewhere
Hetzner's network had problems in their Falkenstein datacenter. Our services stayed up because we split workloads across regions and keep critical data in Azure.
Kubernetes StatefulSet: A Deep Dive
Understanding StatefulSet internals, ordered pod management, persistent storage, and real-world use cases for stateful applications in Kubernetes
Migration Diary Part 2: Moving Logs from Grafana Cloud to Kubernetes
Setting up Loki and Alloy for log aggregation in our Kubernetes cluster. Learning what all those Loki components actually do.
Migration Diary Part 1: Moving Metrics from Grafana Cloud to Kubernetes
Moving our monitoring from Grafana Cloud to self-hosted Prometheus and Grafana on Kubernetes. Turns out most apps already had metrics support, just needed to enable it.
Creating a Least-Privilege Monitoring User in Zalando Postgres Operator
How I solved the challenge of creating a monitoring-only user with minimal permissions in a GitOps-managed Postgres cluster
Zero-Downtime Helm App Upgrade in Production
How to upgrade a Helm-managed application in production with zero downtime using GitOps and Kubernetes RollingUpdate strategy
Why I'm Obsessed with Uptime: The Real Cost of Downtime
My journey into understanding why every millisecond matters in DevOps, and what the research taught me about building reliable systems
Managing Secrets in Kubernetes with External Secrets Operator
A comprehensive guide to implementing External Secrets Operator for secure secret management in Kubernetes clusters