How It Works
How It Works
Queue Autoscale for Laravel uses a hybrid predictive algorithm to make intelligent scaling decisions.
Overview
Queue Autoscale for Laravel uses a hybrid predictive algorithm that combines three different scaling approaches to make intelligent decisions about worker counts:
- Little's Law - Steady-state calculation based on current workload
- Trend Prediction - Proactive scaling based on traffic forecasts
- Backlog Drain - Aggressive scaling to prevent SLA breaches
The autoscaler takes the maximum of these three calculations to ensure SLA compliance while being responsive to changing conditions.
The Evaluation Loop
1. Metrics Retrieval Phase
Every evaluation cycle (default: 5 seconds), the autoscaler:
1. Retrieves all queues and metrics from laravel-queue-metrics
└─ Single call: QueueMetrics::getAllQueuesWithMetrics()
2. Receives comprehensive queue data
├─ Queue connection and name
├─ Current worker count
├─ Processing rate (jobs/second)
├─ Pending job count (backlog depth)
├─ Oldest job age
├─ Trend data (historical rates and forecasts)
└─ Processing time statistics
3. Loads per-queue configuration
└─ SLA targets, min/max workers, cooldown periods
Package Separation:
- laravel-queue-metrics does: Queue discovery, connection scanning, metrics collection
- laravel-queue-autoscale does: Consumes metrics, applies algorithms, manages workers
2. Calculation Phase
For each queue received from the metrics package, the autoscaler calculates three target worker counts:
A. Little's Law (Steady State)
Workers_steady = Arrival_Rate × Average_Job_Time
Purpose: Baseline calculation for current workload When it dominates: Stable traffic, no backlog Example:
- Rate: 10 jobs/sec
- Avg time: 2 seconds/job
- Workers: 10 × 2 = 20 workers
B. Trend Prediction (Proactive)
Workers_predicted = Forecasted_Rate × Average_Job_Time
Purpose: Scale ahead of demand increases When it dominates: Traffic trending upward Example:
- Current rate: 10 jobs/sec
- Trend: +20% (forecasted: 12 jobs/sec)
- Avg time: 2 seconds/job
- Workers: 12 × 2 = 24 workers
C. Backlog Drain (SLA Protection)
Workers_drain = Backlog / (Time_Until_Breach / Avg_Job_Time)
Purpose: Prevent SLA violations When it dominates: Old jobs approaching SLA target Example:
- Backlog: 100 jobs
- Oldest job: 25 seconds old
- SLA target: 30 seconds
- Time remaining: 5 seconds
- Avg time: 2 seconds/job
- Jobs per worker: 5s / 2s = 2.5 jobs
- Workers: 100 / 2.5 = 40 workers
3. Decision Phase
1. Take maximum of three calculations
target = max(steady, predicted, drain)
2. Apply constraints
├─ System capacity limits (CPU/memory from system-metrics)
├─ Configured min/max workers per queue
└─ Cooldown periods (prevent rapid scaling)
3. Create scaling decision
├─ Current worker count
├─ Target worker count
├─ Reason for decision
├─ Predicted pickup time
└─ SLA target
4. Execution Phase
1. Execute "before" policies
├─ Validation hooks
├─ Logging
└─ External notifications
2. Scale workers
├─ If target > current: Spawn new workers
├─ If target < current: Terminate excess workers
└─ If target = current: No action
3. Execute "after" policies
├─ Metrics collection
├─ Notifications
└─ Cleanup
4. Broadcast events
├─ ScalingDecisionMade (every cycle)
├─ WorkersScaled (on changes)
└─ SlaBreachPredicted (on breach risk)
Example Scenarios
Scenario 1: Gradual Traffic Increase
Time: 09:00 - Morning traffic starts
├─ Rate: 5 jobs/sec → Workers: 10 (Little's Law)
│
Time: 09:15 - Traffic increasing
├─ Rate: 8 jobs/sec
├─ Trend: +20% forecast → 9.6 jobs/sec
└─ Workers: 20 (Trend prediction wins)
│
Time: 09:30 - Peak traffic
├─ Rate: 12 jobs/sec
└─ Workers: 24 (Steady state sufficient)
Result: Smooth scaling without SLA breaches
Scenario 2: Sudden Traffic Spike
Time: 10:00 - Normal traffic
├─ Rate: 10 jobs/sec
├─ Backlog: 0
└─ Workers: 20
│
Time: 10:01 - Marketing campaign starts
├─ Rate: 50 jobs/sec (5x increase!)
├─ Backlog: 200 jobs accumulating
├─ Oldest job: 15 seconds old
│
Time: 10:02 - Autoscaler responds
├─ Steady: 50 × 2 = 100 workers
├─ Predicted: 60 × 2 = 120 workers (trend up)
├─ Backlog drain: 200 / ((30-15)/2) = 27 workers
└─ Workers: 120 (Predicted wins)
│
Time: 10:03 - Jobs aging, SLA at risk
├─ Oldest job: 28 seconds (2s from breach!)
├─ Backlog drain: 200 / ((30-28)/2) = 200 workers
└─ Workers: 200 (SLA protection kicks in!)
Result: Aggressive scaling prevents SLA breach
Scenario 3: Traffic Decrease
Time: 17:00 - Peak traffic ending
├─ Rate: 20 jobs/sec
└─ Workers: 40
│
Time: 17:15 - Traffic declining
├─ Rate: 15 jobs/sec
├─ Trend: -20% forecast → 12 jobs/sec
├─ Workers: 30 (Little's Law)
└─ Cooldown prevents immediate scale-down
│
Time: 17:20 - Cooldown expires
├─ Rate: 10 jobs/sec
└─ Workers: 20 (gradual scale-down)
│
Time: 18:00 - Minimal traffic
├─ Rate: 2 jobs/sec
└─ Workers: 4 → 1 (workers.min)
Result: Gradual, cost-effective scale-down
SLA Target Behavior
How SLA Targets Work
Instead of saying "I want 10 workers", you say:
'sla' => ['target_seconds' => 30]
This means: "Jobs should start processing within 30 seconds of being queued"
The autoscaler calculates how many workers are needed to meet this target.
Breach Prevention
The autoscaler is proactive about SLA targets:
SLA Target: 30 seconds
Breach Threshold: 80% (24 seconds) - configurable
┌─────────────────────────────────────┐
│ 0s 12s 24s 30s │
│ ├──────┴──────┼───────────┤ │
│ Safe Action Breach │
│ Threshold │
└─────────────────────────────────────┘
When oldest job reaches 24s:
→ Backlog drain algorithm activates
→ Aggressive scaling to prevent breach
Multiple SLA Tiers
You can configure different SLAs per queue:
use Cbox\LaravelQueueAutoscale\Configuration\Profiles\BalancedProfile;
use Cbox\LaravelQueueAutoscale\Configuration\Profiles\CriticalProfile;
use Cbox\LaravelQueueAutoscale\Configuration\Profiles\BackgroundProfile;
'sla_defaults' => BalancedProfile::class, // 30s SLA default
'queues' => [
'critical' => CriticalProfile::class, // 10s SLA
'emails' => ['sla' => ['target_seconds' => 300]], // 5 min override
],
Understanding SLA Timing
SLA targets define the maximum acceptable pickup time — the time between a job being dispatched and a worker starting to process it. In practice, most jobs are picked up far faster than the SLA target. A 30-second SLA does not mean jobs take 30 seconds — it means the autoscaler guarantees they start within 30 seconds, with the vast majority processing near-instantly.
However, there are hard timing floors imposed by Laravel's queue worker internals that every operator should understand.
Floor 1: Worker Poll Loop (~3-5 seconds)
Even with a running, idle worker, job pickup is not instant. Laravel's queue:work command operates on a sleep/poll cycle:
Worker idle loop:
├─ Poll queue for next job
├─ No job found
├─ Sleep for sleep_seconds (default: 3s)
├─ Poll again
└─ Job found → start processing
The worst-case pickup time for an idle worker is roughly sleep_seconds plus a small overhead for the poll itself. With the default sleep_seconds: 3, this means ~3-5 seconds in the worst case.
This means SLA targets below 5 seconds will always produce flaky breach events, regardless of how many workers are running. This is expected behaviour — it reflects the fundamental polling model of Laravel's queue worker, not a limitation of the autoscaler.
Tip:
CriticalProfilesetssleep_seconds: 1to minimize this floor, but even then sub-5s SLA targets are unreliable due to poll overhead and job deserialization time.
Floor 2: Scale-from-Zero Latency (~8-12 seconds)
Profiles with workers.min = 0 (BurstyProfile, BackgroundProfile) can scale the queue to zero workers during idle periods. When a new job arrives, the autoscaler must:
Scale-from-zero timeline:
├─ Job dispatched to empty queue
├─ Wait for next evaluation cycle (up to evaluation_interval: 5s)
├─ Autoscaler detects pending job
├─ Spawn worker process (1-2s startup)
├─ Worker enters poll loop
├─ Worker picks up job (up to sleep_seconds: 3s)
└─ Total: ~8-12 seconds typical
This is a conscious trade-off: zero idle cost in exchange for slower first-job pickup after an idle period. If this latency is unacceptable for a queue, set workers.min >= 1.
Practical Guidelines
| SLA Target | Recommendation |
|---|---|
| < 5 seconds | Not recommended. Will produce flaky breaches regardless of configuration. Requires infrastructure outside this package's scope (e.g. synchronous processing, always-on consumers). |
| 5-10 seconds | Requires workers.min >= 1 and low sleep_seconds (1-2). Use CriticalProfile or a custom profile. Scale-from-zero is not viable at this SLA. |
| 10-30 seconds | The sweet spot for most user-facing queues. workers.min >= 1 recommended. Outliers may approach the SLA target; the vast majority of jobs process near-instantly. |
| 30-300 seconds | Comfortable range. Scale-from-zero (workers.min = 0) is viable. The occasional 8-12s cold start is well within budget. |
Why This Matters for Profiles
The shipped profiles are designed with these floors in mind:
- CriticalProfile (10s SLA, min=5):
sleep_seconds: 1minimizes poll latency. Five always-on workers eliminate scale-from-zero entirely. - BurstyProfile (60s SLA, min=0): 60-second SLA comfortably absorbs the ~8-12s scale-from-zero floor.
- BackgroundProfile (300s SLA, min=0): 5-minute SLA makes the cold start negligible.
If you create a custom profile with both a tight SLA (< 10s) and workers.min = 0, the autoscaler will honour it — but expect frequent breach events during scale-from-zero transitions.
Worker Lifecycle
Spawning Workers
1. Autoscaler determines need for new workers
2. WorkerSpawner creates Symfony Process:
php artisan queue:work {connection} --queue={queue}
3. Process starts in background
4. WorkerPool tracks process metadata:
├─ PID
├─ Connection/queue
├─ Spawn time
└─ Health status
Monitoring Workers
Every evaluation cycle:
1. ProcessHealthCheck verifies worker health
├─ Process still running?
├─ Process responding?
└─ Process memory/CPU within limits?
2. Dead workers removed from pool
3. Health data used for scaling decisions
Terminating Workers
When scaling down:
1. Select workers to terminate (oldest first)
2. Send SIGTERM (graceful shutdown)
3. Wait 10 seconds for graceful exit
4. Send SIGKILL if still running (force)
5. Remove from worker pool
Why graceful shutdown matters:
- Allows jobs to complete
- Prevents job failures
- Maintains data integrity
Resource Constraints
System Capacity
The CapacityCalculator uses system-metrics package to determine available resources:
Available CPU cores: 8
Available memory: 16 GB
Current worker cost: ~100 MB RAM per worker
Max workers by RAM: 16000 MB / 100 MB = 160 workers
Max workers by CPU: 8 cores × 2 = 16 workers (conservative)
Capacity limit: min(160, 16) = 16 workers
Configuration Limits
'workers' => [
'min' => 1, // Always maintain at least 1
'max' => 10, // Never exceed 10
],
Cooldown Periods
'scaling' => ['cooldown_seconds' => 60], // global, top-level
Purpose: Prevent rapid scaling oscillations
Scale up at 10:00 → Workers: 5 → 10
Cooldown until 10:01
Can scale again at 10:01
Metrics and Visibility
What Gets Logged
Every evaluation cycle logs:
[autoscale] Queue: redis/default
Current: 5 workers
Target: 8 workers
Reason: "trend predicts rate increase: 10.00/s → 12.00/s"
Action: Spawning 3 workers
What Events Fire
// Every cycle
ScalingDecisionMade::class
// On worker changes
WorkersScaled::class
->from(5)
->to(8)
->change(+3)
// On SLA risk
SlaBreachPredicted::class
->queue('default')
->predictedPickupTime(28.5)
->slaTarget(30)
What Metrics Are Tracked
From laravel-queue-metrics package:
- Processing rate (jobs/second)
- Active worker count
- Pending job count
- Oldest job age
- Trend data (historical rates)
Metrics Package Setup
All metrics are collected by the laravel-queue-metrics package. Ensure it's properly configured:
Storage Setup:
# Redis (recommended for autoscaling)
QUEUE_METRICS_STORAGE=redis
QUEUE_METRICS_CONNECTION=default
# OR Database (for persistent metrics)
QUEUE_METRICS_STORAGE=database
Installation:
composer require cboxdk/laravel-queue-metrics
php artisan vendor:publish --tag=queue-metrics-config
Learn more: Metrics Package Documentation
Common Questions
Q: Why did workers scale up when queue was empty?
A: Trend prediction detected traffic increase before jobs arrived. This is proactive scaling.
Q: Why didn't workers scale down immediately?
A: Cooldown period prevents rapid scaling. Wait for cooldown to expire.
Q: Why are there more workers than jobs?
A: Workers are scaled for rate, not backlog. A high job rate needs many workers even if backlog is small.
Q: Can I force immediate scaling?
A: Reduce scaling.cooldown_seconds but be cautious of oscillations.
Q: What happens if system runs out of resources?
A: CapacityCalculator limits workers to available CPU/memory automatically.
Next Steps
- Configuration Guide - Configure SLA targets and limits
- Custom Strategies - Write your own scaling logic
- Monitoring Guide - Track autoscaler performance
- Algorithm Details - Deep dive into math