How It Works
How It Works
Laravel Queue Autoscale uses a hybrid predictive algorithm to make intelligent scaling decisions.
Overview
Laravel Queue Autoscale uses a hybrid predictive algorithm that combines three different scaling approaches to make intelligent decisions about worker counts:
- Little's Law - Steady-state calculation based on current workload
- Trend Prediction - Proactive scaling based on traffic forecasts
- Backlog Drain - Aggressive scaling to prevent SLA breaches
The autoscaler takes the maximum of these three calculations to ensure SLA compliance while being responsive to changing conditions.
The Evaluation Loop
1. Metrics Retrieval Phase
Every evaluation cycle (default: 5 seconds), the autoscaler:
1. Retrieves all queues and metrics from laravel-queue-metrics
└─ Single call: QueueMetrics::getAllQueuesWithMetrics()
2. Receives comprehensive queue data
├─ Queue connection and name
├─ Current worker count
├─ Processing rate (jobs/second)
├─ Pending job count (backlog depth)
├─ Oldest job age
├─ Trend data (historical rates and forecasts)
└─ Processing time statistics
3. Loads per-queue configuration
└─ SLA targets, min/max workers, cooldown periods
Package Separation:
- laravel-queue-metrics does: Queue discovery, connection scanning, metrics collection
- laravel-queue-autoscale does: Consumes metrics, applies algorithms, manages workers
2. Calculation Phase
For each queue received from the metrics package, the autoscaler calculates three target worker counts:
A. Little's Law (Steady State)
Workers_steady = Arrival_Rate × Average_Job_Time
Purpose: Baseline calculation for current workload When it dominates: Stable traffic, no backlog Example:
- Rate: 10 jobs/sec
- Avg time: 2 seconds/job
- Workers: 10 × 2 = 20 workers
B. Trend Prediction (Proactive)
Workers_predicted = Forecasted_Rate × Average_Job_Time
Purpose: Scale ahead of demand increases When it dominates: Traffic trending upward Example:
- Current rate: 10 jobs/sec
- Trend: +20% (forecasted: 12 jobs/sec)
- Avg time: 2 seconds/job
- Workers: 12 × 2 = 24 workers
C. Backlog Drain (SLA Protection)
Workers_drain = Backlog / (Time_Until_Breach / Avg_Job_Time)
Purpose: Prevent SLA violations When it dominates: Old jobs approaching SLA target Example:
- Backlog: 100 jobs
- Oldest job: 25 seconds old
- SLA target: 30 seconds
- Time remaining: 5 seconds
- Avg time: 2 seconds/job
- Jobs per worker: 5s / 2s = 2.5 jobs
- Workers: 100 / 2.5 = 40 workers
3. Decision Phase
1. Take maximum of three calculations
target = max(steady, predicted, drain)
2. Apply constraints
├─ System capacity limits (CPU/memory from system-metrics)
├─ Configured min/max workers per queue
└─ Cooldown periods (prevent rapid scaling)
3. Create scaling decision
├─ Current worker count
├─ Target worker count
├─ Reason for decision
├─ Predicted pickup time
└─ SLA target
4. Execution Phase
1. Execute "before" policies
├─ Validation hooks
├─ Logging
└─ External notifications
2. Scale workers
├─ If target > current: Spawn new workers
├─ If target < current: Terminate excess workers
└─ If target = current: No action
3. Execute "after" policies
├─ Metrics collection
├─ Notifications
└─ Cleanup
4. Broadcast events
├─ ScalingDecisionMade (every cycle)
├─ WorkersScaled (on changes)
└─ SlaBreachPredicted (on breach risk)
Example Scenarios
Scenario 1: Gradual Traffic Increase
Time: 09:00 - Morning traffic starts
├─ Rate: 5 jobs/sec → Workers: 10 (Little's Law)
│
Time: 09:15 - Traffic increasing
├─ Rate: 8 jobs/sec
├─ Trend: +20% forecast → 9.6 jobs/sec
└─ Workers: 20 (Trend prediction wins)
│
Time: 09:30 - Peak traffic
├─ Rate: 12 jobs/sec
└─ Workers: 24 (Steady state sufficient)
Result: Smooth scaling without SLA breaches
Scenario 2: Sudden Traffic Spike
Time: 10:00 - Normal traffic
├─ Rate: 10 jobs/sec
├─ Backlog: 0
└─ Workers: 20
│
Time: 10:01 - Marketing campaign starts
├─ Rate: 50 jobs/sec (5x increase!)
├─ Backlog: 200 jobs accumulating
├─ Oldest job: 15 seconds old
│
Time: 10:02 - Autoscaler responds
├─ Steady: 50 × 2 = 100 workers
├─ Predicted: 60 × 2 = 120 workers (trend up)
├─ Backlog drain: 200 / ((30-15)/2) = 27 workers
└─ Workers: 120 (Predicted wins)
│
Time: 10:03 - Jobs aging, SLA at risk
├─ Oldest job: 28 seconds (2s from breach!)
├─ Backlog drain: 200 / ((30-28)/2) = 200 workers
└─ Workers: 200 (SLA protection kicks in!)
Result: Aggressive scaling prevents SLA breach
Scenario 3: Traffic Decrease
Time: 17:00 - Peak traffic ending
├─ Rate: 20 jobs/sec
└─ Workers: 40
│
Time: 17:15 - Traffic declining
├─ Rate: 15 jobs/sec
├─ Trend: -20% forecast → 12 jobs/sec
├─ Workers: 30 (Little's Law)
└─ Cooldown prevents immediate scale-down
│
Time: 17:20 - Cooldown expires
├─ Rate: 10 jobs/sec
└─ Workers: 20 (gradual scale-down)
│
Time: 18:00 - Minimal traffic
├─ Rate: 2 jobs/sec
└─ Workers: 4 → 1 (min_workers)
Result: Gradual, cost-effective scale-down
SLA Target Behavior
How SLA Targets Work
Instead of saying "I want 10 workers", you say:
'max_pickup_time_seconds' => 30
This means: "Jobs should start processing within 30 seconds of being queued"
The autoscaler calculates how many workers are needed to meet this target.
Breach Prevention
The autoscaler is proactive about SLA targets:
SLA Target: 30 seconds
Breach Threshold: 80% (24 seconds) - configurable
┌─────────────────────────────────────┐
│ 0s 12s 24s 30s │
│ ├──────┴──────┼───────────┤ │
│ Safe Action Breach │
│ Threshold │
└─────────────────────────────────────┘
When oldest job reaches 24s:
→ Backlog drain algorithm activates
→ Aggressive scaling to prevent breach
Multiple SLA Tiers
You can configure different SLAs per queue:
'sla_defaults' => [
'max_pickup_time_seconds' => 60, // Default: 1 minute
],
'queue_overrides' => [
'critical' => [
'max_pickup_time_seconds' => 10, // 10 seconds
],
'emails' => [
'max_pickup_time_seconds' => 300, // 5 minutes
],
],
Worker Lifecycle
Spawning Workers
1. Autoscaler determines need for new workers
2. WorkerSpawner creates Symfony Process:
php artisan queue:work {connection} --queue={queue}
3. Process starts in background
4. WorkerPool tracks process metadata:
├─ PID
├─ Connection/queue
├─ Spawn time
└─ Health status
Monitoring Workers
Every evaluation cycle:
1. ProcessHealthCheck verifies worker health
├─ Process still running?
├─ Process responding?
└─ Process memory/CPU within limits?
2. Dead workers removed from pool
3. Health data used for scaling decisions
Terminating Workers
When scaling down:
1. Select workers to terminate (oldest first)
2. Send SIGTERM (graceful shutdown)
3. Wait 10 seconds for graceful exit
4. Send SIGKILL if still running (force)
5. Remove from worker pool
Why graceful shutdown matters:
- Allows jobs to complete
- Prevents job failures
- Maintains data integrity
Resource Constraints
System Capacity
The CapacityCalculator uses system-metrics package to determine available resources:
Available CPU cores: 8
Available memory: 16 GB
Current worker cost: ~100 MB RAM per worker
Max workers by RAM: 16000 MB / 100 MB = 160 workers
Max workers by CPU: 8 cores × 2 = 16 workers (conservative)
Capacity limit: min(160, 16) = 16 workers
Configuration Limits
'min_workers' => 1, // Always maintain at least 1
'max_workers' => 10, // Never exceed 10
Cooldown Periods
'scale_cooldown_seconds' => 60,
Purpose: Prevent rapid scaling oscillations
Scale up at 10:00 → Workers: 5 → 10
Cooldown until 10:01
Can scale again at 10:01
Metrics and Visibility
What Gets Logged
Every evaluation cycle logs:
[autoscale] Queue: redis/default
Current: 5 workers
Target: 8 workers
Reason: "trend predicts rate increase: 10.00/s → 12.00/s"
Action: Spawning 3 workers
What Events Fire
// Every cycle
ScalingDecisionMade::class
// On worker changes
WorkersScaled::class
->from(5)
->to(8)
->change(+3)
// On SLA risk
SlaBreachPredicted::class
->queue('default')
->predictedPickupTime(28.5)
->slaTarget(30)
What Metrics Are Tracked
From laravel-queue-metrics package:
- Processing rate (jobs/second)
- Active worker count
- Pending job count
- Oldest job age
- Trend data (historical rates)
Metrics Package Setup
All metrics are collected by the laravel-queue-metrics package. Ensure it's properly configured:
Storage Setup:
# Redis (recommended for autoscaling)
QUEUE_METRICS_STORAGE=redis
QUEUE_METRICS_CONNECTION=default
# OR Database (for persistent metrics)
QUEUE_METRICS_STORAGE=database
Installation:
composer require cboxdk/laravel-queue-metrics
php artisan vendor:publish --tag=queue-metrics-config
Learn more: Metrics Package Documentation
Common Questions
Q: Why did workers scale up when queue was empty?
A: Trend prediction detected traffic increase before jobs arrived. This is proactive scaling.
Q: Why didn't workers scale down immediately?
A: Cooldown period prevents rapid scaling. Wait for cooldown to expire.
Q: Why are there more workers than jobs?
A: Workers are scaled for rate, not backlog. A high job rate needs many workers even if backlog is small.
Q: Can I force immediate scaling?
A: Reduce scale_cooldown_seconds but be cautious of oscillations.
Q: What happens if system runs out of resources?
A: CapacityCalculator limits workers to available CPU/memory automatically.
Next Steps
- Configuration Guide - Configure SLA targets and limits
- Custom Strategies - Write your own scaling logic
- Monitoring Guide - Track autoscaler performance
- Algorithm Details - Deep dive into math