Architecture

This document provides a deep dive into the Laravel Queue Autoscale architecture, scaling algorithm, and design decisions.

Overview
Theoretical Foundation
Hybrid Algorithm
System Architecture
Data Flow
Scaling Decision Process
Resource Management
Extension Points
Performance Considerations

Overview

Laravel Queue Autoscale uses a hybrid predictive algorithm that combines three complementary approaches:

Rate-Based Scaling (Little's Law) - Steady-state calculation
Trend-Based Scaling (Predictive) - Proactive forecasting
Backlog-Based Scaling (SLA Protection) - Breach prevention

The system takes the maximum of all three calculations to ensure SLA compliance while being resource-aware.

Theoretical Foundation

Little's Law (L = λW)

The foundation of our rate-based scaling is Little's Law, a fundamental theorem in queueing theory:

L = λ × W

Where:
- L = Average number of items in system (workers needed)
- λ = Arrival rate (jobs/second)
- W = Average time in system (seconds/job)

Why Little's Law?

Mathematically proven relationship between queue length, arrival rate, and processing time
Works for any stable queueing system regardless of arrival distribution
Provides theoretical minimum workers needed for steady-state operation

Our Implementation:

public function calculate(float $arrivalRate, float $avgProcessingTime): float
{
    if ($arrivalRate <= 0 || $avgProcessingTime <= 0) {
        return 0.0;
    }

    return $arrivalRate * $avgProcessingTime;
}

Example:

Processing rate: 10 jobs/sec
Average job time: 2 seconds
Workers needed: 10 × 2 = 20 workers

SLA/SLO-Based Approach

Instead of targeting worker counts, we target service level objectives:

SLA: Jobs must be picked up within max_pickup_time_seconds

This transforms the scaling problem from:

❌ "How many workers should we have?" (infrastructure-focused)
✅ "How can we meet our SLA?" (business-focused)

Benefits:

Business-aligned metrics instead of technical ones
Easier to reason about and communicate
Natural scaling boundaries (SLA compliance vs violation)
Predictable system behavior

Hybrid Algorithm

Algorithm Overview

target_workers = max(
    steady_state_workers,    // Little's Law with current rate
    predictive_workers,      // Little's Law with predicted rate
    backlog_drain_workers    // SLA breach prevention
)

final_workers = constrain(
    target_workers,
    min: config.min_workers,
    max: min(config.max_workers, system_capacity)
)

1. Rate-Based Scaling (Steady State)

Purpose: Calculate workers needed for current load.

Algorithm:

steady_state_workers = processing_rate × avg_job_time

When it dominates:

Stable workload with no trend
Normal operating conditions
Queue is near equilibrium

Example:

Processing rate: 5 jobs/sec
Avg job time: 2 sec
Workers: 5 × 2 = 10 workers

2. Trend-Based Scaling (Predictive)

Purpose: Scale proactively based on predicted demand.

Algorithm:

predicted_rate = current_rate × trend_adjustment

Where trend_adjustment:
- trend='up' with forecast: use forecast directly
- trend='up' without forecast: multiply by 1.2 (20% increase)
- trend='down': multiply by 0.8 (20% decrease)
- trend='stable' or null: use current rate (no adjustment)

predictive_workers = predicted_rate × avg_job_time

When it dominates:

Upward trending workload
Predictable traffic patterns (time of day, day of week)
Before load spikes occur

Example:

Current rate: 10 jobs/sec
Trend: up, forecast: 15 jobs/sec
Avg job time: 2 sec
Workers: 15 × 2 = 30 workers (vs 20 for steady state)

Benefit: Scales up before queue depth increases, preventing SLA violations.

3. Backlog-Based Scaling (SLA Protection)

Purpose: Aggressively prevent SLA breaches when backlog exists.

Algorithm:

time_until_breach = sla_target - oldest_job_age
action_threshold = sla_target × breach_threshold (default 0.8)

if oldest_job_age < action_threshold:
    return 0  // No urgent action

if time_until_breach <= 0:
    // Already breached - aggressive scaling
    return ceil(backlog / max(avg_job_time, 0.1))

// Calculate workers to drain backlog before breach
jobs_per_worker = max(time_until_breach / avg_job_time, 1.0)
return backlog / jobs_per_worker

When it dominates:

Backlog exists and oldest job approaching SLA
Recovery from downtime or scaling lag
Burst traffic exceeding predictions

Example 1: Approaching Breach

SLA target: 30 sec
Oldest job: 25 sec (exceeds 80% threshold of 24 sec)
Time until breach: 5 sec
Backlog: 100 jobs
Avg job time: 2 sec
Jobs per worker: 5 / 2 = 2.5 jobs
Workers: 100 / 2.5 = 40 workers (aggressively scales)

Example 2: Already Breached

SLA target: 30 sec
Oldest job: 35 sec (breached!)
Backlog: 100 jobs
Avg job time: 2 sec
Workers: ceil(100 / 2) = 50 workers (maximum aggression)

Protection: Prevents cascade failures where SLA breach → more backlog → worse breach.

Why Maximum?

We take the maximum of all three approaches:

$targetWorkers = max(
    $steadyStateWorkers,
    $predictiveWorkers,
    $backlogDrainWorkers,
);

Reasoning:

Conservative approach - Better to slightly over-scale than violate SLA
Covers different scenarios - Each calculator handles specific conditions
Graceful degradation - If one approach fails/misses, others provide backup
SLA compliance prioritized - Backlog drain ensures we never breach

Trade-off: May occasionally over-scale, but:

Resource constraints prevent waste
Cooldown periods prevent thrashing
Extra capacity quickly absorbed by variability in job processing

System Architecture

Component Diagram

┌─────────────────────────────────────────────────────────────┐
│                    AutoscaleManager                         │
│  ┌───────────────────────────────────────────────────┐     │
│  │ Main Control Loop (every 5 seconds)               │     │
│  │ 1. Get all queues from laravel-queue-metrics      │     │
│  │ 2. For each queue: evaluate & scale               │     │
│  │ 3. Cleanup dead workers                           │     │
│  │ 4. Check for SIGTERM/SIGINT                      │     │
│  └───────────────────────────────────────────────────┘     │
└──────────────────┬──────────────────────────────────────────┘
                   │
         ┌─────────┴──────────┐
         │                    │
    ┌────▼─────┐         ┌────▼────────┐
    │ Scaling  │         │   Worker    │
    │  Engine  │         │ Management  │
    └────┬─────┘         └────┬────────┘
         │                    │
    ┌────▼──────────────┐     │
    │ ScalingStrategy   │     │
    │ (PredictiveStrat) │     │
    └────┬──────────────┘     │
         │                    │
    ┌────▼──────────────┐     │
    │ Calculators:      │     │
    │ • LittlesLaw      │     │
    │ • ArrivalRate     │     │
    │ • BacklogDrain    │     │
    │ • Capacity        │     │
    └───────────────────┘     │
                              │
                  ┌───────────▼────────────┐
                  │ Worker Components:     │
                  │ • WorkerSpawner        │
                  │ • WorkerTerminator     │
                  │ • WorkerPool           │
                  │ • WorkerProcess        │
                  └────────────────────────┘

Class Responsibilities

AutoscaleManager

Main daemon process
Coordinates entire scaling lifecycle
Manages control loop timing
Handles signals (SIGTERM/SIGINT)
Orchestrates worker pool

ScalingEngine

Evaluates scaling decisions
Applies constraints (capacity, config)
Creates ScalingDecision DTOs
Delegates to strategy

PredictiveStrategy

Implements hybrid algorithm
Calls all three calculators
Takes maximum of results
Provides human-readable reasons
Estimates pickup time predictions

Calculators

LittlesLawCalculator: Pure L = λW implementation
ArrivalRateEstimator: Sliding window arrival rate estimation from backlog changes
BacklogDrainCalculator: SLA breach prevention math
CapacityCalculator: System resource limits

Worker Management

WorkerSpawner: Creates queue:work processes
WorkerTerminator: Graceful SIGTERM → SIGKILL shutdown
WorkerPool: Tracks running workers
WorkerProcess: Wraps Symfony Process with metadata

Data Flow

Package Boundary and Data Flow

┌────────────────────────────────────────────────────────────┐
│        laravel-queue-metrics (Dependency Package)          │
│                                                             │
│  • Scans all queue connections (redis, database, sqs)      │
│  • Discovers active queues automatically                   │
│  • Collects queue depth and age metrics                    │
│  • Calculates processing rates                             │
│  • Analyzes trends and creates forecasts                   │
│  • Aggregates all data into QueueMetricsData objects       │
│                                                             │
│  Public API:                                                │
│  QueueMetrics::getAllQueuesWithMetrics()                   │
│         ↓                                                   │
└─────────┼──────────────────────────────────────────────────┘
          │
          │ Returns: Collection<QueueMetricsData>
          │
          ↓
┌─────────┴──────────────────────────────────────────────────┐
│      laravel-queue-autoscale (This Package)                │
│                                                             │
│  • Receives pre-calculated metrics from facade             │
│  • Applies scaling algorithms (Little's Law, Trend,        │
│    Backlog Drain)                                          │
│  • Makes SLA-based scaling decisions                       │
│  • Manages worker pool lifecycle (spawn/terminate)         │
│  • Enforces resource constraints (CPU/memory limits)       │
│  • Executes scaling policies and broadcasts events         │
│                                                             │
│  DOES NOT:                                                  │
│  ✗ Scan queue connections                                  │
│  ✗ Discover queues                                         │
│  ✗ Collect queue metrics                                   │
│  ✗ Calculate processing rates or trends                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Key Principle: Single Responsibility

laravel-queue-metrics: Queue discovery and metrics collection
laravel-queue-autoscale: Scaling decisions and worker management

1. Metrics Collection (External Package)

laravel-queue-metrics (external package)
    ↓
QueueMetrics::getAllQueuesWithMetrics()
    ↓
Returns: Collection<QueueMetricsData>
    {
        connection: 'redis',
        queue: 'default',
        processingRate: 10.5,  // jobs/sec (pre-calculated)
        activeWorkerCount: 20,
        depth: {
            pending: 150,
            oldestJobAgeSeconds: 25,
        },
        trend: {
            direction: 'up',
            forecast: 15.0,  // (pre-calculated)
        },
    }

2. Scaling Evaluation

AutoscaleManager
    ↓
For each queue metrics:
    ↓
ScalingEngine::evaluate(metrics, config, currentWorkers)
    ↓
PredictiveStrategy::calculateTargetWorkers(metrics, config)
    ↓
┌──────────────────────────────┐
│ Run 3 Calculators:           │
│ 1. LittlesLaw(rate, time)    │
│ 2. TrendPolicy(rate, policy) │
│ 3. BacklogDrain(backlog,sla) │
└──────────────────────────────┘
    ↓
Take max(steady, predictive, backlog)
    ↓
Apply capacity constraints from system-metrics
    ↓
Apply config bounds (min/max workers)
    ↓
Return ScalingDecision

3. Scaling Execution

ScalingDecision
    ↓
If shouldScaleUp():
    workers_to_add = targetWorkers - currentWorkers
    ↓
    WorkerSpawner::spawn(connection, queue, workers_to_add)
    ↓
    WorkerPool::addMany(newWorkers)
    ↓
    Broadcast WorkersScaled event

If shouldScaleDown():
    workers_to_remove = currentWorkers - targetWorkers
    ↓
    WorkerPool::remove(connection, queue, workers_to_remove)
    ↓
    WorkerTerminator::terminate(each removed worker)
    ↓
    Broadcast WorkersScaled event

4. Worker Lifecycle

WorkerSpawner::spawn()
    ↓
Creates Symfony Process:
    [PHP_BINARY, artisan, queue:work, connection,
     --queue=name, --tries=3, --max-time=3600, --sleep=3]
    ↓
Process::start() → Background process
    ↓
Wrapped in WorkerProcess(process, connection, queue, spawnedAt)
    ↓
Added to WorkerPool
    ↓
... Worker processes jobs ...
    ↓
When scaling down:
    ↓
WorkerTerminator::terminate(worker)
    ↓
1. posix_kill(pid, SIGTERM)
2. Wait shutdown_timeout_seconds (default 30s)
3. If still running: posix_kill(pid, SIGKILL)
    ↓
Worker terminated

Scaling Decision Process

Decision Flow

1. Get Metrics
   ├─ processingRate
   ├─ activeWorkerCount
   ├─ backlog (pending jobs)
   ├─ oldestJobAge
   └─ trend

2. Estimate Avg Job Time
   If activeWorkers > 0 && processingRate > 0:
       avgJobTime = activeWorkers / processingRate
   Else:
       avgJobTime = 1.0 (fallback)

3. Calculate All Three Approaches
   ├─ steadyState = processingRate × avgJobTime
   ├─ predictive = predictedRate × avgJobTime
   └─ backlogDrain = calculate based on SLA proximity

4. Take Maximum
   targetWorkers = max(steadyState, predictive, backlogDrain)

5. Apply System Capacity
   maxPossible = CapacityCalculator::calculateMaxWorkers()
   targetWorkers = min(targetWorkers, maxPossible)

6. Apply Config Bounds
   targetWorkers = max(targetWorkers, config.minWorkers)
   targetWorkers = min(targetWorkers, config.maxWorkers)

7. Check Cooldown
   If lastScaledAt + cooldown > now:
       Skip scaling (wait for cooldown)

8. Execute Scaling
   If targetWorkers > currentWorkers:
       Scale Up
   Else if targetWorkers < currentWorkers:
       Scale Down
   Else:
       No Change

Cooldown Logic

Prevents scaling thrash:

if (now()->diffInSeconds($this->lastScaled[$key] ?? 0) < $config->scaleCooldownSeconds) {
    continue; // Skip this queue, still in cooldown
}

Why cooldowns?

Workers need time to start and begin processing
Metrics need time to reflect scaling changes
Prevents oscillation (scale up → scale down → scale up...)

Default: 60 seconds between scaling operations per queue.

Resource Management

CPU Constraints

$maxCpuPercent = config('queue-autoscale.resource_limits.max_cpu_percent'); // 90%
$cpuUsage = SystemMetrics::cpuUsage(1.0)->usagePercentage(); // e.g., 60%

$availableCpuPercent = max($maxCpuPercent - $cpuUsage, 0); // 30%

$reserveCores = config('queue-autoscale.resource_limits.reserve_cpu_cores'); // 0.5
$usableCores = max($limits->availableCpuCores() - $reserveCores, 1);

$maxWorkersByCpu = floor($usableCores * ($availableCpuPercent / 100));

Memory Constraints

$maxMemoryPercent = config('queue-autoscale.resource_limits.max_memory_percent'); // 85%
$memoryUsage = SystemMetrics::memory()->usedPercentage(); // e.g., 50%

$availableMemoryPercent = max($maxMemoryPercent - $memoryUsage, 0); // 35%

$workerMemoryMb = config('queue-autoscale.resource_limits.worker_memory_mb_estimate'); // 128 MB
$totalMemoryMb = $limits->availableMemoryBytes() / (1024 * 1024);

$maxWorkersByMemory = floor(
    ($totalMemoryMb * ($availableMemoryPercent / 100)) / $workerMemoryMb
);

Most Restrictive Wins

return max(min($maxWorkersByCpu, $maxWorkersByMemory), 0);

Ensures we never exceed either CPU or memory limits.

Extension Points

Custom Scaling Strategies

Implement ScalingStrategyContract:

interface ScalingStrategyContract
{
    public function calculateTargetWorkers(object $metrics, QueueConfiguration $config): int;
    public function getLastReason(): string;
    public function getLastPrediction(): ?float;
}

Examples:

TimeOfDayStrategy: Scale based on time patterns
BudgetAwareStrategy: Cap workers based on cost constraints
MLPredictiveStrategy: Use machine learning for forecasting
ConservativeStrategy: Always maintain buffer capacity

Scaling Policies

Implement ScalingPolicy for hooks:

interface ScalingPolicy
{
    public function beforeScaling(ScalingDecision $decision): void;
    public function afterScaling(ScalingDecision $decision, bool $success): void;
}

Use Cases:

Pre-warming caches before scale-up
Notifying monitoring systems
Rate limiting scale operations
Cost tracking and budgets
Compliance logging

Event Subscribers

React to scaling events:

Event::listen(ScalingDecisionMade::class, function ($event) {
    // Log, metrics, external systems
});

Event::listen(WorkersScaled::class, function ($event) {
    // Track worker count metrics
});

Event::listen(SlaBreachPredicted::class, function ($event) {
    // Alert on-call engineers
});

Performance Considerations

Evaluation Frequency

Default: Every 5 seconds

Trade-offs:

Faster (1-2s): More responsive, higher CPU usage, more scaling decisions
Slower (10-30s): Lower overhead, may miss short spikes, delayed reactions

Recommendation: 5-10 seconds for most workloads.

Metrics Overhead

Metrics collection happens in laravel-queue-metrics (external):

Runs in background
Minimal impact on queue processing
Pre-aggregated before autoscaler sees them

Our overhead:

Simple calculations (3 multiplications, 1 max)
O(1) complexity for each queue
Total: <10ms for dozens of queues

Worker Spawn Time

Process creation: ~50-200ms
Laravel bootstrap: ~100-500ms
Queue worker ready: ~200-700ms total

Implication: Autoscaler compensates by:

Predictive scaling (scale before demand)
Minimum workers (always-ready capacity)
Cooldown periods (wait for workers to start)

Memory Footprint

Per worker process:

Laravel app: ~50-100 MB
Queue jobs: Variable (10-100+ MB)
Default estimate: 128 MB

Total system:

Autoscaler daemon: ~50 MB
Workers: N × 128 MB

For 50 workers: ~6.4 GB + Laravel base

Design Decisions

Why Not Reactive-Only?

Reactive scaling (respond after queue depth grows) always lags:

Load spike → Queue grows → Detect → Scale → Workers start → Begin processing
Total lag: 30-60 seconds

Predictive scaling reduces lag by anticipating demand.

Why Three Approaches?

Each handles different scenarios:

Scenario	Dominant Approach
Stable load	Steady state (Little's Law)
Predictable growth	Trend-based
Burst traffic	Backlog drain
Mixed patterns	Maximum of all three

Why SLA-Based?

Business Alignment:

"Jobs picked up within 30s" is business requirement
Worker counts are implementation detail
Easier to communicate with stakeholders

Natural Bounds:

SLA compliance = success
SLA violation = failure
Clear objective function

Why Process-Based Workers?

Isolation:

Each worker is separate process
Memory leaks contained
Crashes don't affect others

Control:

Can SIGTERM/SIGKILL individual workers
Easy monitoring (process table)
Standard Unix tooling works

Simplicity:

No threading complexity
No shared state issues
Matches Laravel queue:work model

Future Enhancements

Potential Improvements

ML-Based Prediction
- Train on historical patterns
- Better forecasting accuracy
- Seasonal adjustment
Cost Optimization
- Factor in compute costs
- Balance SLA vs budget
- Spot instance awareness
Multi-Dimensional Scaling
- Scale by job type, not just queue
- Priority-based worker allocation
- Resource quotas per tenant
Advanced Metrics
- Job failure rates
- Retry patterns
- Dependency graphs
Auto-Tuning
- Learn optimal min/max workers
- Adjust cooldown periods
- Calibrate breach thresholds

Extensibility by Design

The architecture supports these enhancements through:

Strategy pattern for algorithms
Policy hooks for behavior
Event system for integration
Dependency injection for swapping components

Conclusion

Laravel Queue Autoscale combines queueing theory, predictive analysis, and SLA-based optimization to provide intelligent, automatic worker scaling. The hybrid algorithm ensures SLA compliance while being resource-aware and extensible.

For usage examples, see Quick Start.

For implementation details, review the source code with this architecture in mind.

Architecture

Architecture

Table of Contents

Overview

Theoretical Foundation

Little's Law (L = λW)

SLA/SLO-Based Approach

Hybrid Algorithm

Algorithm Overview

1. Rate-Based Scaling (Steady State)

2. Trend-Based Scaling (Predictive)

3. Backlog-Based Scaling (SLA Protection)

Why Maximum?

System Architecture

Component Diagram

Class Responsibilities

AutoscaleManager

ScalingEngine

PredictiveStrategy

Calculators

Worker Management

Data Flow

Package Boundary and Data Flow

1. Metrics Collection (External Package)

2. Scaling Evaluation

3. Scaling Execution

4. Worker Lifecycle

Scaling Decision Process

Decision Flow

Cooldown Logic

Resource Management

CPU Constraints

Memory Constraints

Most Restrictive Wins

Extension Points

Custom Scaling Strategies

Scaling Policies

Event Subscribers

Performance Considerations

Evaluation Frequency

Metrics Overhead

Worker Spawn Time

Memory Footprint

Design Decisions

Why Not Reactive-Only?

Why Three Approaches?

Why SLA-Based?

Why Process-Based Workers?

Future Enhancements

Potential Improvements

Extensibility by Design

Conclusion