Introduction
Container health checks enable Docker to distinguish between running containers and those actively serving traffic. Without health checks, Docker considers a container running as long as its main process exists, regardless of whether the application inside functions correctly. Health checks solve this problem by defining commands that verify actual application health, enabling Docker to restart unhealthy containers and orchestrators to route traffic away from failing instances.
Health checks matter critically in production environments. Applications crash, hang, run out of resources, or enter degraded states where the process runs but serves errors. Health checks detect these conditions and trigger appropriate responses. Orchestrators like Docker Swarm and Kubernetes use health status to make load balancing decisions, removing unhealthy containers from service rotation automatically.
This comprehensive guide covers health check implementation in Dockerfiles, Docker Compose, and runtime configurations. You will learn to write effective health check commands, configure appropriate intervals and timeouts, handle edge cases, and integrate health status with monitoring systems. By implementing proper health checks, your containers achieve self-healing behavior that reduces operational burden.
Understanding Docker Health Checks
Health checks extend Docker’s awareness of container state beyond simple process existence.
When you define a health check, Docker executes the specified command inside the container at configured intervals. The command returns an exit code indicating health status: 0 means healthy, 1 means unhealthy. Docker tracks these results and updates container status accordingly.
Health check results affect container lifecycle in several ways. Docker Compose reports container health in status output. Docker Swarm uses health status to determine service task readiness. Kubernetes requires health checks for rolling updates and readiness probes. Monitoring systems can query health status through the Docker API.
Three components define health check behavior. The test command specifies what to execute. The interval defines how frequently checks run. The timeout determines how long to wait for command completion. Retries specify consecutive failures before marking unhealthy. Start period provides initialization buffer before checks begin.
Implementing Health Checks in Dockerfiles
Dockerfile HEALTHCHECK instructions create images with built-in health monitoring.
Basic Health Check
# Simple health check using curl for web services
FROM nginx:alpine
# Health check every 30 seconds, timeout after 5 seconds
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD curl -f http://localhost/ || exit 1
# Copy custom application files
COPY nginx.conf /etc/nginx/nginx.conf
COPY html/ /usr/share/nginx/html/
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
Application-Specific Health Checks
Different applications require different health verification approaches:
# Node.js application
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY src/ ./src/
# Health check verifies API endpoint
HEALTHCHECK --interval=15s --timeout=10s --start-period=5s --retries=3 \
CMD node -e "require('http').get('http://localhost:3000/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"
EXPOSE 3000
CMD ["node", "src/index.js"]
# Python/Flask application
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# Health check using Flask's built-in endpoint
HEALTHCHECK --interval=20s --timeout=5s --start-period=5s --retries=3 \
CMD python -c "import requests; requests.get('http://localhost:5000/health')" || exit 1
EXPOSE 5000
CMD ["python", "app.py"]
# Database container (PostgreSQL)
FROM postgres:15-alpine
# Health check using pg_isready
HEALTHCHECK --interval=10s --timeout=5s --start-period=10s --retries=3 \
CMD pg_isready -U postgres -d postgres
VOLUME /var/lib/postgresql/data
EXPOSE 5432
# Redis container
FROM redis:7-alpine
# Health check using redis-cli ping
HEALTHCHECK --interval=5s --timeout=3s --start-period=5s --retries=3 \
CMD redis-cli ping | grep -q PONG
EXPOSE 6379
CMD ["redis-server"]
Shell vs Exec Form
Health check commands support shell and exec forms:
# Shell form (commands run through shell)
HEALTHCHECK CMD /bin/bash -c "curl -f http://localhost/health || exit 1"
# Exec form (direct exec, preferred)
HEALTHCHECK CMD ["curl", "-f", "http://localhost/health"]
Exec form avoids shell processing and works more reliably. Use shell form only when shell features like pipes or redirects are necessary.
Complex Health Checks
# Multi-step health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
CMD /bin/bash -c '
# Check process is running
pgrep -x myapp > /dev/null || exit 1
# Check port is listening
ss -tlnp | grep -q ":8080" || exit 1
# Check disk space
df /data | awk "NR==2 {print \$5}" | grep -q "[0-9]%" || exit 1
# Check memory usage
free | awk "NR==2 {print \$3/\$2*100}" | awk "{if (\$1 > 90) exit 1}"
exit 0
'
Docker Compose Health Checks
Compose files define health checks with additional orchestration features.
Compose File Configuration
version: '3.8'
services:
web:
image: nginx:alpine
ports:
- "80:80"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
depends_on:
db:
condition: service_healthy
api:
build: ./api
ports:
- "8080:8080"
environment:
- DATABASE_URL=postgres://db:5432/app
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/health"]
interval: 15s
timeout: 5s
retries: 3
start_period: 5s
depends_on:
db:
condition: service_healthy
db:
image: postgres:15-alpine
environment:
POSTGRES_DB: app
POSTGRES_USER: appuser
POSTGRES_PASSWORD: apppassword
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}"]
interval: 10s
timeout: 5s
retries: 3
start_period: 10s
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 2
volumes:
postgres_data:
depends_on with Health Conditions
Docker Compose 2.1+ supports health-dependent startup:
services:
web:
build: ./web
depends_on:
api:
condition: service_healthy
db:
condition: service_healthy
Services won’t start until dependencies report healthy status.
Runtime Health Check Configuration
Override or add health checks when running containers:
# Override health check at runtime
docker run \
--health-cmd="curl -f http://localhost:9000/health || exit 1" \
--health-interval=30s \
--health-timeout=10s \
--health-retries=3 \
--health-start-period=15s \
myapp:latest
Health Check Best Practices
Effective health checks follow specific principles for reliability.
Principles for Health Checks
Health checks should verify actual application functionality, not just process existence. A web server returning 500 errors still runs as a process but fails the health check. Database connections should actually work. API endpoints should return valid responses.
Keep health checks lightweight to avoid performance impact. Complex checks with external dependencies or heavy operations slow container startup and add load. Consider async or cached results for expensive checks.
Fail fast on critical conditions. Health checks that take too long delay failure detection. Configure appropriate timeouts that account for expected response times plus margin.
Appropriate Intervals and Timeouts
Configure timing based on recovery objectives:
# Fast detection for critical services
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 5s # Check every 5 seconds
timeout: 2s # Fail if check takes > 2 seconds
retries: 2 # Mark unhealthy after 2 consecutive failures
start_period: 10s # Allow 10s for startup
# Slower checks for background services
healthcheck:
interval: 60s
timeout: 30s
retries: 5
Handling Edge Cases
Address common failure modes:
#!/bin/bash
# Robust health check script
# Check process exists
if ! pgrep -x myapp > /dev/null; then
echo "Process not running"
exit 1
fi
# Check port is listening
if ! ss -tlnp 2>/dev/null | grep -q ":8080"; then
echo "Port not listening"
exit 1
fi
# Check disk space (exit if below threshold)
disk_pct=$(df /data | awk 'NR==2 {print $5}' | tr -d '%')
if [ "$disk_pct" -gt 90 ]; then
echo "Disk space critically low: ${disk_pct}%"
exit 1
fi
# Check memory
mem_pct=$(free | awk 'NR==2 {printf "%.0f", $3/$2*100}')
if [ "$mem_pct" -gt 95 ]; then
echo "Memory critically low: ${mem_pct}%"
exit 1
fi
# Check dependent service
if ! curl -sf http://localhost:9000/ > /dev/null 2>&1; then
echo "Dependency unavailable"
exit 1
fi
echo "Healthy"
exit 0
Integrating with Monitoring Systems
Connect health check data to monitoring infrastructure.
Querying Health Status
# View container health status
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Health}}"
# Detailed health information
docker inspect --format='{{range .State.Health.Status}}{{.}}{{end}}' container_name
# Full health check history
docker inspect --format='{{json .State.Health.Log}}' container_name | jq
# Check specific health check result
docker inspect --format='{{range .State.Health.Log}}{{.Output}}{{end}}' container_name
Prometheus Scraping
Export health metrics for Prometheus:
#!/bin/bash
# Export Docker health checks as Prometheus metrics
while true; do
echo '# HELP docker_container_health_status Health status (1=healthy, 0=unhealthy)'
echo '# TYPE docker_container_health_status gauge'
docker ps --format '{{.Names}}' | while read name; do
status=$(docker inspect --format='{{if .State.Health}}{{range .State.Health.Status}}{{.}}{{end}}{{else}}none{{end}}' "$name")
value=1
if [ "$status" != "healthy" ]; then
value=0
fi
echo "docker_container_health_status{name=\"$name\"} $value"
done
sleep 15
done | nc -l 9090
Alerting Integration
#!/bin/bash
# Alert on unhealthy containers
UNHEALTHY=$(docker ps --format '{{.Names}}' --filter "health=unhealthy")
if [ -n "$UNHEALTHY" ]; then
for container in $UNHEALTHY; do
# Get health check failure details
details=$(docker inspect --format='{{json .State.Health.Log}}' "$container" | \
jq -r '.[-1].Output')
# Send alert (example using curl to webhook)
curl -X POST "https://alerts.example.com/webhook" \
-H "Content-Type: application/json" \
-d "{
\"alert\": \"Container Unhealthy\",
\"container\": \"$container\",
\"details\": \"$details\"
}"
done
fi
Health Checks in Orchestrators
Health checks integrate deeply with container orchestrators.
Docker Swarm Services
# Create service with health check
docker service create \
--name api \
--health-cmd "curl -f http://localhost/health || exit 1" \
--health-interval 10s \
--health-retries 3 \
--health-timeout 5s \
--replicas 3 \
myapi:latest
# Update service with health check
docker service update --health-cmd "curl -f http://localhost/health || exit 1" api
Swarm uses health status to determine service readiness. Unhealthy tasks are removed from load balancing and may be replaced according to update policies.
Container Restart Policies
# Restart containers based on health status
docker run \
--restart=on-failure:3 \
--health-cmd="curl -f http://localhost/health || exit 1" \
myapp:latest
# Restart policies:
# no - Don't restart (default)
# on-failure - Restart if container exits with error
# unless-stopped - Always restart unless manually stopped
# always - Always restart
Troubleshooting Health Checks
Common health check problems and solutions.
Health Checks Not Running
# Verify health check is configured
docker inspect --format='{{.Config.Healthcheck}}' container_name
# Check Docker daemon health
docker info | grep Health
# Verify container has health check capability
docker run --rm myimage cat /etc/passwd | grep healthd
Intermittent Failures
# View health check logs
docker inspect --format='{{json .State.Health.Log}}' container_name | jq
# Monitor health check execution
watch -n 1 'docker inspect --format "{{.State.Health.Status}}" container_name'
# Check for resource constraints
docker stats container_name
Timeout Issues
# Increase timeout for slow applications
docker run \
--health-timeout=30s \
--health-cmd="wget --timeout=20 -q --spider http://localhost/ || exit 1" \
myapp:latest
# Profile health check duration
time docker exec container_name /path/to/healthcheck
Conclusion
Health checks transform containers from simple process containers into self-monitoring services. Implement health checks that verify actual application functionality, configure appropriate timing parameters, and integrate health status with your monitoring infrastructure. The investment in writing robust health checks pays dividends through automated recovery and improved reliability.
Update health checks as applications evolve. New features may require new checks, and changing dependencies may invalidate existing assumptions. Regular review ensures health checks remain effective at detecting real problems.
Related Guides: