Troubleshooting
Troubleshooting
Find your symptom in the list below, then follow the diagnosis steps in order. Every command on this page is real and safe to run in production.
Symptom index
- Jobs are piling up but no workers are spawning
- Workers spawn but die within seconds
- Workers keep spawning and terminating (flapping)
- Logs show the same SLA breach line every few seconds
- Manager starts but produces no output
- Manager crashes on startup
- An exclusive queue keeps respawning its worker
- A group never scales up even though its members have jobs
- Deploy finishes but new config is not applied
- Two autoscalers are fighting each other
First diagnostic command
Before anything else:
php artisan queue:autoscale:debug --queue=<your-queue> --connection=<your-connection>
This dumps what both the metrics package and the autoscaler see for that queue. If the numbers are empty or wrong, the problem is upstream of this package — see Manager starts but produces no output.
Jobs are piling up but no workers are spawning
Most common root causes, in order:
- The manager isn't running. Check with
ps aux | grep queue:autoscale. If missing, start it (see your platform deployment guide). - The queue is
excluded. Checkconfig/queue-autoscale.phpfor a pattern matching your queue name in theexcludedkey, or check the log forQueue excluded from autoscaling. - The metrics package isn't collecting data. Run
php artisan queue:autoscale:debug --queue=default. IfQueueMetrics::getQueueDepth()returns zeros when you know there are pending jobs,laravel-queue-metricsisn't wired up — check its config and storage backend (Redis vs database). - The manager is at the config-driven worker cap. Check
workers.maxfor the queue.-vvmode on the manager saysConstrained by workers.max config limitwhen this happens. - System capacity is maxed out.
-vvvmode shows the CPU/memory ceiling. Iflimits.max_cpu_percentormax_memory_percentis reached, spawning is blocked to protect the host.
Quick check:
# Run the manager in a one-shot very-verbose mode, look at one cycle:
php artisan queue:autoscale -vvv --interval=5
# Watch one full cycle, then Ctrl-C.
The output names the limiting factor for every queue.
Workers spawn but die within seconds
Symptoms: Log shows Worker spawned PID=X, then Worker did not stop gracefully or Removed dead worker a moment later.
Most common causes:
-
The
queue:workinvocation fails. Run the exact command the manager uses, manually, as the same user:sudo -u forge php /home/forge/your-app/current/artisan queue:work redis --queue=defaultThe error will be immediate — almost always
.envpermissions, missing Redis connection, or a wrong PHP path. -
PHP is running out of memory on each job. Check
storage/logs/laravel.logforAllowed memory sizeerrors in the spawned workers. Raisememory_limitin php.ini or--memoryon the worker. -
A long-running job is hitting the
workers.timeout_secondslimit. Default is 3600s — workers are recycled after that. Expected behaviour unless you see it every few seconds.
Workers keep spawning and terminating (flapping)
Symptoms: Logs show continuous scaled UP followed by scaled DOWN every few evaluation cycles.
Causes and fixes:
- Arrival rate is genuinely oscillating. The autoscaler is responding correctly, but it's noisy. Raise
scaling.cooldown_seconds(default 60) to dampen. Anti-flapping already suppresses direction-reversals within the cooldown unless there's an active SLA breach. - Your
workers.minand target from strategy are close. The strategy drops below min, gets clamped up, next cycle drops again. Lowerworkers.minto 0 or widen the gap between expected demand and min. - A flaky metric source. If
laravel-queue-metricsis reporting wildly different throughput values from one cycle to the next, the strategy will over-react.queue:autoscale:debugrun multiple times in a row should show stable numbers when there's no real traffic.
Logs show the same SLA breach line every few seconds
Fixed in v2: BreachNotificationPolicy now rate-limits its log output via AlertRateLimiter (default 300s cooldown).
If you're still seeing the spam:
- Verify you're on v2 and the policy is registered in
config/queue-autoscale.php → policies. - Check
queue-autoscale.alerting.cooldown_seconds— it may be set to a very low value. - If your app has custom listeners on
SlaBreachPredicted, they need their own rate-limiting. UseAlertRateLimiterin them — pattern shown in the cookbook.
Manager starts but produces no output
The manager is quiet by design unless something changes. Default output shows scaling events and per-cycle stats only when verbose.
# See what the manager is actually doing:
php artisan queue:autoscale -vv
If -vv still shows nothing after a full evaluation interval (5s default), something is broken:
- Config-cached stale values. Run
php artisan config:clearon the host, then restart the manager. - The metrics package is returning an empty queue list. Run
queue:autoscale:debugfor a queue you know has jobs. If totals are zero, the metrics package isn't wired to your queue driver — its config is the problem. - All queues are
excluded. Check theexcludedarray and any globs.
Manager crashes on startup
Check the first lines of stderr/the log. Common failures:
Queue autoscale is disabled in config— set'enabled' => trueorQUEUE_AUTOSCALE_ENABLED=true.queue-autoscale.pickup_time.store must be a class that implements PickupTimeStoreContract— yourpickup_time.storeconfig value is not a valid class. Runvendor:publishagain and merge manually.Group configuration is invalid — groups disabled until manager restart— a queue appears in bothqueuesand a group, or in multiple groups. Fix the config; the manager will still run but with groups disabled until you restart.
An exclusive queue keeps respawning its worker
Symptoms: Log shows Supervisor respawned pinned workers frequently for the same queue.
Causes:
- The worker job is crashing. Run one of the queue's jobs synchronously to see the error:
php artisan queue:work redis --queue=legacy-sync --once. - A memory leak in the job code is hitting PHP's
memory_limit. Long-running workers accumulate memory. For jobs with known leaks, setworkers.timeout_secondsto a low value (e.g. 300) on that queue so they recycle regularly — the supervisor respawns them automatically. - Something external is killing PHP processes. OS-level OOM-killer, a misconfigured systemd service, or a deploy script that kills
queue:workbut leaves the autoscaler running. Checkdmesgand systemd journals.
A group never scales up even though its members have jobs
Two scenarios:
- Members are newly-active queues the metrics package hasn't discovered yet. Fixed in the v2 topology release — the manager force-fetches metrics for every group member on each cycle. Verify you're on the latest v2.
- Group validation failed at startup and groups were disabled. Check the log for
Group configuration is invalid — groups disabled until manager restart. Fix the config (remove duplicate queue names across groups/queues) and restart the manager.
If neither applies, run queue:autoscale:debug for each member queue. If the metrics are all zero but you know jobs are being processed, the metrics package is either not recording pickups or not persisting them.
Deploy finishes but new config is not applied
The manager is a long-running process — it holds the config in memory. It does not re-read the file.
Fix: restart the manager. Every platform deployment guide covers the correct restart command, including php artisan queue:autoscale:restart for deploy hooks that cannot call systemctl / supervisorctl directly.
If you're on Forge/Ploi, the zero-downtime deploy flow restarts the daemon automatically. If you skip that step in a custom deploy script, add sudo supervisorctl restart daemon-<id> (or the systemd equivalent).
Two autoscalers are fighting each other
Symptoms: Worker counts bouncing wildly, queue:autoscale:debug shows more workers than you configured, logs on two different servers each claim to have spawned/killed the same PID.
Cause: Either:
-
Two managers are running on the same host/app, which now fails fast unless you explicitly use
queue:autoscale --replace. -
Multiple hosts are running autoscale without cluster mode enabled.
-
Cluster mode is enabled, but two nodes collided onto the same configured
manager_id. -
Running
queue:autoscaleon multiple web nodes instead of enabling cluster mode. -
A stale manager from a previous deploy that wasn't terminated.
-
Docker Swarm/Kubernetes with
replicas: 2while cluster mode is still disabled.
Fix: In single-host mode, run exactly one manager per app. In cluster mode, run exactly one manager per host and let Redis-backed coordination handle the rest. Only set QUEUE_AUTOSCALE_MANAGER_ID manually if you need to override the auto-generated node identity.
Still stuck?
- Capture
php artisan queue:autoscale -vvvfor one minute and attach it to your issue. - Also include
php artisan queue:autoscale:debug --queue=<affected-queue>output. - Open a GitHub issue: cboxdk/laravel-queue-autoscale/issues.