Traces
Traces
Spans
$result = Telemetry::span('billing.recalculate', function ($span) use ($tenant) {
$span->setAttribute('tenant.id', $tenant->id);
return $service->recalculate($tenant);
});
The closure form ends the span for you, records exceptions
(exception span event + error status) and rethrows. The manual form:
$span = Telemetry::span('phase.one', attributes: ['shard' => 3]);
$span->addEvent('checkpoint', ['rows' => 5000]);
$span->setStatus(SpanStatus::Ok);
$span->end();
Spans are objects, never looked up by name — two concurrent spans with the same name are simply two spans. Nesting follows the call structure: a span started while another is active becomes its child.
Automatic instrumentation
| Source | Span | Config key |
|---|---|---|
| HTTP requests | GET /users/{id} (server) |
instrument.requests |
| Queue jobs | App\Jobs\Import process (consumer) |
instrument.jobs |
| DB queries | db.query (client, backdated) |
instrument.queries |
| Artisan commands | artisan app:sync |
instrument.commands (off by default) |
| Scheduled tasks | schedule artisan inspire |
instrument.scheduled_tasks |
mail.send (client) |
instrument.mail |
|
| Notifications | notification.send (client) |
instrument.notifications |
| Blade/PHP views | view components.button — nested, real durations, detail-marked |
instrument.views |
| DB transactions | db.transaction (nested via savepoints, outcome attribute) |
instrument.transactions |
| Redis commands | redis GET (client, backdated, key only) |
instrument.redis (off by default) |
| Cache counters | cache.operations{operation,store} |
instrument.cache (off by default) |
| Cache timeline spans | cache.hit/miss/write/forget with key + duration |
instrument.cache_spans (off by default) |
| Outgoing HTTP | GET api.stripe.com (client) + duration histogram by host |
instrument.http_client |
| Reported exceptions | exceptions.reported{exception} counter + span event — includes HANDLED report()s |
instrument.exceptions |
Request root spans are named METHOD /route/{pattern} by default.
Behind catch-all routes, name them yourself with
Telemetry::nameRequestsUsing() — and add attributes at terminate with
enrichRequestsUsing(); see Runtime hooks.
An explicit updateName() during the request always survives terminate.
Query spans are only recorded inside an active trace — no orphan roots
from tinker sessions. The ROOT span additionally carries per-request
tallies — db.query.count and db.query.time_ms ("12 queries / 48 ms"
at a glance, even when individual query spans are filtered by the noise
floor).
Consumer (job) spans carry messaging.wait_time_ms — how long the job
sat in the queue between dispatch and the attempt starting — backed by
the queue.job.wait_time histogram and a queue.jobs.dispatched
counter on the producer side.
Request spans carry session.driver and session.hash — a truncated
SHA-256 of the session id (never the id itself; it is an authentication
credential). The hash is stable across a visit, so one TraceQL query
follows a whole visitor journey: { span.session.hash = "3f2a…" }.
Disable with instrument.session.
Request spans carry enduser.id, enduser.type (the model:
user/admin/reseller) and enduser.guard (the guard that
authenticated) — never name or email. Multi-guard apps stay
disambiguated: admin #7 and user #7 are different identities.
Filter in TraceQL: { span.enduser.id = "42" && span.enduser.type = "admin" }.
The login POST itself and logout requests are attributed too (the
Login/Logout events are remembered within the request). Disable with
instrument.user; enrich (explicit PII opt-in) with
Telemetry::resolveUserUsing(fn ($user, ?string $guard) => [...]).
Resource attribution
Request, worker-job and scheduled-task spans carry
php.memory.peak_bytes and php.cpu.time_ms — the peak memory and CPU
time of THAT unit of work (the process-global peak counter is reset per
request/job/task, so long-lived workers report honestly). Matching
histograms (http.server.memory.peak, http.server.cpu.time,
queue.job.*, schedule.task.duration) give p95 memory/CPU per route,
per job — and per custom label dimension. Disable with
instrument.resources.
With cboxdk/system-metrics installed, spans additionally carry the
process' real OS footprint via a ProcessMetrics tracker around each
unit of work: process.memory.rss_peak_bytes (sees non-PHP allocations
the PHP allocator misses) and process.cpu.utilization — the same
mechanism cboxdk/laravel-queue-metrics uses for per-job metrics.
Every sub-span also carries its own php.cpu.time_ms and
php.memory.delta_bytes (allocation delta — may be negative), so the
trace waterfall shows WHERE the CPU and memory went, not just the
totals. Backdated query spans are excluded (their work already happened
when they're recorded).
{ name = "order.payment" } | select(span.php.cpu.time_ms, span.php.memory.delta_bytes)
{ kind = server && span.php.memory.peak_bytes > 134217728 } # requests over 128 MB
Custom dimensions (context)
Nightwatch-style facets — set once, applied everywhere:
// e.g. in middleware, after tenant/team resolution:
Telemetry::context([
'team.id' => $team->id,
'team.name' => $team->slug,
'plan' => $team->plan,
]);
From that point every span, event and telemetry-channel log record in the
request carries the dimensions (span-specific attributes win on
conflict) — and dispatched jobs inherit them, together with
messaging.origin.name (the dispatching request/command name), so a job
is queryable by team AND traceable back to the exact request that queued
it:
{ span.team.name = "checkout" && kind = consumer }
{ span.messaging.origin.name = "POST /demo/orders" }
Context clears automatically between requests and jobs.
Metric dimensions (bounded!)
Context is traces/events/logs only — metric labels multiply cardinality. For bounded dimensions (plan, tier, team — never raw ids) opt in to extra request-duration labels:
Telemetry::labelRequestsUsing(fn ($request) => [
'plan' => $request->user()?->plan ?? 'guest',
]);
That enables per-plan latency in PromQL:
histogram_quantile(0.95, sum by (le, plan)
(rate(http_server_request_duration_milliseconds_bucket[5m])))
Core labels (http.route, method, status) always win over resolver
labels; a throwing resolver is reported and ignored.
Context propagation
Outbound propagation uses the full W3C traceparent — trace id and
span id — so downstream spans are children, not detached roots:
- Queued jobs: payloads automatically carry the dispatcher's traceparent; workers continue it. (Sync jobs run inline in the dispatcher's context.)
- Incoming HTTP: the middleware continues
traceparentheaders whentraces.continue_incomingis on. - Outbound HTTP: opt in per request with the client macro (deliberate, so trace headers never leak to third parties by accident):
Http::withTraceparent()->post($url, $payload);
The macro is a no-op when no trace is active.
The trace id as a support reference
The trace id doubles as the reference that ties error trackers, support cases and logs back to the trace:
X-Trace-Idresponse header on every traced request (traces.response_header, set null to disable).- Laravel
Context:trace_idis added at trace start — Sentry (≥ 4.x), Flare and every log channel pick it up automatically. An explicit Sentry scope tag is set too (traces.share_context). - Error pages:
Telemetry::traceId()is available while the error view renders — show it as “quote this reference id to support”.
The full flows (Sentry → trace, support case → trace, error page recipe) live in Error tracking & support flow.
Sampling
traces.sample_rate (0–1) decides once per trace, at the root. Children
inherit the decision; remote callers' decisions are respected via the
sampled flag. Unsampled spans still exist as context — ids propagate — but
are never buffered or exported.
Error spans escape sampling (traces.always_sample_errors, default
on): a 10%-sampled app still exports every failing span. The escaped
span's trace may be partial — healthy siblings were dropped under the
head decision.
Per-route overrides via the Sample middleware — the re-decision covers the whole active trace, including the still-open request span:
use Cbox\Telemetry\Http\Middleware\Sample;
Route::get('/health', HealthController::class)->middleware(Sample::never());
Route::post('/checkout', ...)->middleware(Sample::always());
Route::get('/feed', ...)->middleware(Sample::rate(0.01));
Tail detail retention
MANY details when it hurts, a lean skeleton when all is well:
TELEMETRY_TRACE_DETAILS=tail
TELEMETRY_SLOW_REQUEST_MS=1000
TELEMETRY_SLOW_SPAN_MS=100
In tail mode, detail spans (cache operations, queries) are kept only
for traces that turned out interesting: an error span anywhere, a
request over slow_request_ms, or a single detail span over
slow_span_ms (one slow query keeps the WHOLE trace's details). Healthy
fast traces ship the skeleton — root span with all its tallies
(db.query.count, cache.event.count, resources) — while counters and
histograms flow unconditionally.
The decision happens at flush, when the entire trace is in memory — tail-based detail retention without a collector. Buffer-cap force flushes always keep details: a 5000-span request IS interesting.
Bootstrap visibility
When LARAVEL_START is defined (it is, in every standard public/ index.php), the request trace includes a backdated laravel.bootstrap
span covering framework boot up to the middleware stack, and the request
span carries laravel.bootstrap_ms.
Buffering
Finished spans buffer in memory and flush at terminate — export latency
happens after the response is sent. The buffer is capped
(traces.max_buffer, default 5000) and force-flushes when full, so
long-running workers and Octane can't grow unbounded.