Health Checks Configuration
Health Checks Configuration
Configure health monitoring for your processes to ensure reliability and enable proper dependency management.
Overview
Health checks monitor process health and enable:
- ✅ Automatic restart on failure
- ✅ Dependency waiting (processes wait for healthy dependencies)
- ✅ Health status reporting via metrics and API
- ✅ Graceful degradation patterns
Basic Configuration
processes:
nginx:
command: ["nginx", "-g", "daemon off;"]
health_check:
type: http
address: "http://127.0.0.1:80/health"
interval: 10
timeout: 5
retries: 3
success_threshold: 2
Health Check Types
HTTP Health Check
health_check:
type: http
address: "http://127.0.0.1:80/health"
interval: 10
timeout: 5
retries: 3
success_threshold: 2
expected_status: 200
expected_body: "OK"
Settings:
address- Full HTTP URL to checkexpected_status- Expected HTTP status code (default:200)expected_body- Optional body content match
Best for:
- Web servers (Nginx, Apache)
- HTTP APIs
- Services with health endpoints
Example Endpoint:
// Simple health endpoint (any PHP framework)
<?php
http_response_code(200);
header('Content-Type: application/json');
echo json_encode(['status' => 'healthy']);
TCP Health Check
health_check:
type: tcp
address: "127.0.0.1:9000"
interval: 10
timeout: 3
retries: 3
Settings:
address- TCP address in formathost:port
Best for:
- PHP-FPM (port 9000)
- Redis (port 6379)
- MySQL (port 3306)
- Services that listen on TCP ports
Example:
processes:
php-fpm:
command: ["php-fpm", "-F", "-R"]
health_check:
type: tcp
address: "127.0.0.1:9000"
interval: 5
Exec Health Check
health_check:
type: exec
command: ["php", "artisan", "health:check"]
interval: 30
timeout: 10
retries: 2
Settings:
command- Command to execute (array format)- Process is healthy if exit code is
0
Best for:
- Custom health logic
- Database connectivity checks
- Application-specific validation
- Multi-service health aggregation
Example Health Check Script:
<?php
// health-check.php - works with any PHP framework/app
// Check database (PDO)
try {
$pdo = new PDO('mysql:host=localhost;dbname=myapp', 'user', 'pass');
$pdo->query('SELECT 1');
} catch (PDOException $e) {
echo "Database connection failed\n";
exit(1);
}
// Check Redis (if using)
$redis = new Redis();
if (!$redis->connect('localhost')) {
echo "Redis connection failed\n";
exit(1);
}
echo "All systems healthy\n";
exit(0); // Success
Common Settings
interval
Type: integer (seconds)
Default: 10
Description: Time between health checks.
health_check:
interval: 30 # Check every 30 seconds
Recommendations:
- Critical services: 5-10 seconds
- Standard services: 10-30 seconds
- Heavy checks: 30-60 seconds
- Exec commands: 30-120 seconds (depending on execution time)
timeout
Type: integer (seconds)
Default: 5
Description: Maximum time to wait for health check response.
health_check:
timeout: 10 # Wait up to 10 seconds
Guidelines:
- Should be less than
interval - HTTP checks: 3-10 seconds
- TCP checks: 1-5 seconds
- Exec checks: 5-30 seconds (match command execution time)
retries
Type: integer
Default: 3
Description: Number of consecutive failures before marking unhealthy.
health_check:
retries: 5 # Allow 5 failures before marking unhealthy
Recommendations:
- Stable services: 2-3 retries
- Flaky services: 5-10 retries
- Critical services: 1-2 retries (fail fast)
success_threshold
Type: integer
Default: 1
Description: Number of consecutive successes before marking healthy.
health_check:
success_threshold: 3 # Require 3 successes to become healthy
Use cases:
- Services with slow startup: Require multiple successes (2-5)
- Fast services: Single success (1)
- Flaky services: Require sustained success (3-5)
Health Check Lifecycle
[Process Starts]
↓
[Starting] ← Health checks not yet running
↓
[Healthy] ← success_threshold consecutive successes
↓
[Unhealthy] ← retries consecutive failures
↓
[Restart] ← If restart policy allows
State Transitions:
- Process starts →
Startingstate - After
success_thresholdsuccesses →Healthy - After
retriesfailures →Unhealthy - Unhealthy process restarts (if
restart: always)
Advanced Patterns
Dependency Waiting
processes:
php-fpm:
health_check:
type: tcp
address: "127.0.0.1:9000"
nginx:
depends_on: [php-fpm] # Waits for PHP-FPM to be healthy
health_check:
type: http
address: "http://127.0.0.1:80/health"
Behavior:
- Nginx waits for PHP-FPM to reach
Healthystate - If PHP-FPM becomes unhealthy, Nginx continues running
- On restart, Nginx waits again for PHP-FPM
Multi-Layer Health Checks
processes:
app:
command: ["./my-app"]
health_check:
type: exec
command: ["/health-check.sh"]
interval: 30
health-check.sh:
#!/bin/bash
set -e
# Check HTTP endpoint
curl -f http://localhost:9180/health || exit 1
# Check database
psql -U user -d db -c "SELECT 1" || exit 1
# Check disk space
df -h / | awk 'NR==2 {if ($5+0 > 90) exit 1}'
echo "Health check passed"
exit 0
Graceful Degradation
processes:
primary-service:
command: ["./primary"]
health_check:
type: http
address: "http://127.0.0.1:8080/health"
retries: 10 # Tolerate more failures
success_threshold: 3 # Require sustained success
fallback-service:
enabled: false # Enable manually when primary fails
command: ["./fallback"]
Complete Examples
PHP Application
processes:
# PHP-FPM with TCP check
php-fpm:
command: ["php-fpm", "-F", "-R"]
health_check:
type: tcp
address: "127.0.0.1:9000"
interval: 5
timeout: 2
retries: 3
# Nginx with HTTP check
nginx:
command: ["nginx", "-g", "daemon off;"]
depends_on: [php-fpm]
health_check:
type: http
address: "http://127.0.0.1:80/health"
interval: 10
timeout: 5
retries: 3
expected_status: 200
# Queue worker with exec check (Laravel example)
queue-worker:
command: ["php", "artisan", "queue:work"]
health_check:
type: exec
command: ["pgrep", "-f", "queue:work"]
interval: 60
timeout: 10
retries: 2
Microservices Stack
processes:
database:
command: ["postgres"]
health_check:
type: tcp
address: "127.0.0.1:5432"
interval: 5
cache:
command: ["redis-server"]
health_check:
type: tcp
address: "127.0.0.1:6379"
interval: 5
api:
command: ["./api-server"]
depends_on: [database, cache]
health_check:
type: http
address: "http://127.0.0.1:8080/health"
interval: 10
expected_body: '{"status":"ok"}'
worker:
command: ["./worker"]
depends_on: [database, cache]
health_check:
type: exec
command: ["./worker-health-check"]
interval: 30
Troubleshooting
Health Check Always Failing
# Increase timeout
health_check:
timeout: 15 # Was 5, too short
# Allow more retries
health_check:
retries: 5 # Was 3
# Check less frequently
health_check:
interval: 30 # Was 10
Health Check Too Sensitive
# Require sustained success
health_check:
success_threshold: 5 # Was 1
# Tolerate transient failures
health_check:
retries: 10 # Was 3
Slow Startup Detection
# Wait longer for initial health
health_check:
success_threshold: 3 # Require 3 successes
interval: 10
timeout: 10
# First success after 10s, healthy after 30s (3 × 10s)
Monitoring Health Status
Via Metrics
# Prometheus metrics
curl http://localhost:9090/metrics | grep health
# Example output
cbox_init_process_health_status{process="nginx"} 1 # 1 = healthy
Via Management API
# Get process status
curl http://localhost:9180/api/v1/processes | jq '.[] | {name, health_status}'
# Example output
{
"name": "nginx",
"health_status": "healthy"
}
See Also
- Process Configuration - Process settings
- Lifecycle Hooks - Pre/post hooks
- Prometheus Metrics - Health metrics
- Management API - Health status API