Zero-Downtime Helm App Upgrade in Production

Zero-Downtime Helm App Upgrade in Production
While operating the CloudCops platform with active user traffic, we needed to upgrade our Helm application from version 4.12.1 to 4.13.1 in production. The challenge was that simply changing the image tag would trigger ArgoCD to automatically deploy, causing service downtime during the update.
This article shares how to safely upgrade a Helm-managed application in a production environment with continuous user traffic.
The Problem
Our CloudCops Helm application configuration was:
# values.yaml
replicaCount: 1
image:
repository: cloudcops/app
tag: 4.12.1
To upgrade from 4.12.1 to 4.13.1:
- Update the image tag in Git repository's
values.yaml - ArgoCD detects changes and automatically deploys
- Existing Pod terminates → New Pod starts
- Service downtime occurs during this process
- Active user traffic is disrupted
Solution: RollingUpdate Strategy
Using Kubernetes RollingUpdate strategy enables zero-downtime deployments.
1. Update Helm Values
# values.yaml
replicaCount: 2 # ← At least 2 replicas required
image:
repository: cloudcops/app
tag: 4.13.1 # Updated version
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Additional Pods allowed during update
maxUnavailable: 0 # Zero unavailable Pods = zero downtime
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
2. Key Configuration
| Setting | Value | Purpose |
|---|---|---|
replicaCount | 2 | Minimum 2 Pods required for zero downtime |
maxSurge | 1 | Allows replicas + 1 Pods during update |
maxUnavailable | 0 | All Pods must remain available during update |
readinessProbe | - | New Pods receive traffic only when ready |
Deployment Process (GitOps)
Step 1: Update Git Repository
# Update Helm values
vim values.yaml
# Commit changes
git add values.yaml
git commit -m "feat: upgrade cloudcops app 4.12.1 → 4.13.1 with zero-downtime"
git push origin main
Step 2: ArgoCD Auto-Deployment
When ArgoCD detects the Git repository change:
- Create new Pod (version 4.13.1)
- Wait for readinessProbe to pass
- Route traffic to new Pod
- Terminate old Pod (version 4.12.1)
- Create second new Pod
- Terminate remaining old Pod
→ At least 2 Pods always running → Zero downtime for live user traffic
Production Deployment Results
Applied to CloudCops platform with live user traffic:
# Monitor in ArgoCD
$ kubectl get pods -n cloudcops -w
NAME READY STATUS
cloudcops-app-v4121-abc 2/2 Running # Old Pod 1 (4.12.1)
cloudcops-app-v4121-def 2/2 Running # Old Pod 2 (4.12.1)
cloudcops-app-v4131-ghi 0/2 ContainerCreating # New Pod creating
cloudcops-app-v4131-ghi 2/2 Running # New Pod ready (4.13.1)
cloudcops-app-v4121-abc 2/2 Terminating # Old Pod 1 terminating
cloudcops-app-v4131-jkl 0/2 ContainerCreating # New Pod 2 creating
cloudcops-app-v4121-abc 0/2 Terminated # Old Pod 1 terminated
cloudcops-app-v4131-jkl 2/2 Running # New Pod 2 ready
cloudcops-app-v4121-def 2/2 Terminating # Old Pod 2 terminating
→ Zero downtime upgrade from 4.12.1 to 4.13.1 in production!
→ No user traffic disruption during the entire process!
Key Takeaways
- replicaCount ≥ 2: Essential for zero-downtime deployments
- maxUnavailable: 0: Ensures all Pods remain available during updates
- readinessProbe: New Pods receive traffic only when fully ready
- GitOps + Helm: Git commit → ArgoCD auto-sync → Consistent deployment process
- Production-Ready: Successfully tested with live user traffic on CloudCops platform
References
With proper Kubernetes configuration and Helm chart settings, zero-downtime deployments are achievable even in production environments with active user traffic. This approach has been successfully validated on the CloudCops platform!