DNS Lookups Were Timing Out Randomly in Kubernetes
DNS Lookups Were Timing Out Randomly in Kubernetes
Started seeing occasional DNS timeout errors in application logs:
Error: Failed to resolve api.example.com: Temporary failure in name resolution
Happened maybe 1% of DNS requests. Most lookups worked fine, but occasionally one would timeout after 5 seconds.
Users didn't notice because our apps retry failed requests. But it was annoying and pointed to an underlying issue.
Took me 2 days to track down. Problem was conntrack table exhaustion, not CoreDNS.
Checking CoreDNS
First suspected CoreDNS (Kubernetes DNS server). Checked if pods were healthy:
kubectl get pods -n kube-system -l k8s-app=kube-dns
All CoreDNS pods running and ready. No restarts.
Checked logs:
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100
No errors. Just normal DNS queries being handled.
Checked CPU and memory:
kubectl top pods -n kube-system -l k8s-app=kube-dns
CPU at 5%, memory at 50MB. Plenty of headroom.
So CoreDNS itself was fine. Problem must be elsewhere.
Testing DNS resolution
Created a debug pod to test DNS:
kubectl run debug --image=busybox --rm -it -- sh
Inside the pod:
# Test DNS resolution
nslookup api.example.com
Most of the time it worked instantly. Occasionally it hung for 5 seconds then timed out.
Tried querying CoreDNS directly:
nslookup api.example.com 10.96.0.10
(10.96.0.10 is our CoreDNS service IP)
Same issue - occasional timeouts.
Checking network connectivity
Maybe packets were getting dropped somewhere. Tested with tcpdump:
kubectl exec -it coredns-pod -n kube-system -- tcpdump -i any port 53
Saw DNS queries coming in, but for the timed-out requests, there was no response from CoreDNS. The query arrived but no answer was sent.
This was weird. CoreDNS received the query (it's in the tcpdump) but didn't respond.
Connection tracking (conntrack)
Linux uses conntrack to track network connections. Kubernetes uses iptables rules that depend on conntrack.
Checked conntrack table on worker nodes:
ssh worker-node-1
sysctl net.netfilter.nf_conntrack_count
sysctl net.netfilter.nf_conntrack_max
Output:
net.netfilter.nf_conntrack_count = 65519
net.netfilter.nf_conntrack_max = 65536
The conntrack table was 99% full. When it fills up, new connections get dropped.
This explained the DNS timeouts. When a pod tries to make a DNS query and the conntrack table is full, the connection can't be established and the query times out.
Why was conntrack table full?
Checked what was using conntrack entries:
conntrack -L | head -20
Saw thousands of entries for old connections in TIME_WAIT state. These were connections that had closed but were still in conntrack waiting to be cleaned up.
Our applications make a lot of short-lived HTTP requests. Each request creates a conntrack entry. The default timeout for TIME_WAIT is 120 seconds.
With high request volume, conntrack entries accumulated faster than they expired.
The fix
Increased conntrack table size:
# On each worker node
sysctl -w net.netfilter.nf_conntrack_max=262144
sysctl -w net.netfilter.nf_conn track_buckets=65536
# Make it permanent
echo "net.netfilter.nf_conntrack_max=262144" >> /etc/sysctl.conf
echo "net.netfilter.nf_conntrack_buckets=65536" >> /etc/sysctl.conf
sysctl -p
This quadrupled the conntrack table size. DNS timeouts stopped.
Also reduced TIME_WAIT timeout:
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30
echo "net.netfilter.nf_conntrack_tcp_timeout_time_wait=30" >> /etc/sysctl.conf
This clears old conntrack entries faster.
Better solution: Connection pooling
The real issue is our apps were creating too many short-lived connections. Better to reuse connections via connection pooling.
Updated HTTP client config in applications:
# Before: New connection for every request
import requests
response = requests.get('https://api.example.com/endpoint')
# After: Reuse connections
session = requests.Session()
response = session.get('https://api.example.com/endpoint')
Session reuses TCP connections. This reduces conntrack entries by 90%.
Same fix in other languages:
// Go
client := &http.Client{
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10,
},
}
// Node.js
const http = require('http');
const agent = new http.Agent({
keepAlive: true,
maxSockets: 50
});
axios.get(url, { httpAgent: agent });
Monitoring conntrack usage
Added Prometheus metrics for conntrack:
# node-exporter collects these automatically
node_nf_conntrack_entries
node_nf_conntrack_entries_limit
Created alert:
- alert: ConntrackTableNearFull
expr: node_nf_conntrack_entries / node_nf_conntrack_entries_limit > 0.9
for: 5m
annotations:
summary: "Conntrack table is {{ $value }}% full on {{ $labels.instance }}"
Now we get alerted before conntrack fills up completely.
CoreDNS configuration tweaks
While investigating, also improved CoreDNS config:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000 # increased from default
}
cache 30 # cache DNS responses for 30 seconds
loop
reload
loadbalance
}
Changes:
- Increased
max_concurrentto handle more parallel queries - Added
cache 30to reduce upstream queries
Why this was hard to debug
DNS timeouts are tricky because:
- They're intermittent (only when conntrack is full)
- CoreDNS looks healthy (it is healthy)
- Error happens at network layer, not application layer
- No obvious errors in logs
The clue was that queries were received but not answered. This pointed to connection tracking issues rather than DNS server issues.
Other DNS problems we've seen
External DNS resolution slow:
CoreDNS was fast for internal services but slow for external domains. Fixed by using Google/Cloudflare DNS instead of our ISP's DNS:
forward . 8.8.8.8 1.1.1.1
ndots causing extra queries:
Kubernetes sets ndots:5 in resolv.conf. This means queries for api.example.com first try api.example.com.default.svc.cluster.local, then api.example.com.svc.cluster.local, etc.
Reduced ndots to 2 in pod DNS config:
dnsConfig:
options:
- name: ndots
value: "2"
This reduced DNS query volume by 60%.
CoreDNS pod count:
We had 2 CoreDNS pods. Increased to 3 for better distribution:
kubectl scale deployment coredns -n kube-system --replicas=3
Lessons
- DNS timeouts aren't always a DNS problem
- Conntrack table size matters for high-connection workloads
- Use connection pooling to reduce conntrack entries
- Monitor conntrack usage
- Intermittent issues require patient debugging
After increasing conntrack limits and adding connection pooling, DNS timeouts dropped to zero. The issue wasn't CoreDNS at all - it was Linux connection tracking exhaustion.