Dynamic per-user Rails debug logs, scoped to the request

A user has been failing the same flow three times in a row. Their account looks fine. The inputs are well-formed. The trace says we got to the service that should have completed the transaction, then we did, then somehow the user is back at the failure screen. It's the same playbook as always: you want debug logs from their next attempt.

You can crank Rails.logger.level = :debug for the whole fleet. You will not, because you saw the logs bill last quarter. (If the volume math behind that is unfamiliar, the previous post has a primer on why production defaults sit at info or warn.) You can dial PaymentsService to debug with the per-component pattern from that post. That works, but now you're logging every user's payments at debug, and most of them aren't the user you care about. The signal is still buried, just under a smaller pile.

You're after a different shape entirely. Debug for THIS user's next handful of requests. Everyone else stays at whatever the app's default is. Filterable in the aggregator by user. Expires on its own. Cooperates with the per-component dial without stomping on it.

This post is how I got there.

What I wanted

Concrete requirements:

Per-user, not per-class. Last post's dial handled per-class. The new one works on the orthogonal axis.

Request-scoped. When user 12345's next request comes in, that request's lines emit at debug. When their request completes, the dial unwinds. The next request on the same worker thread starts clean.

A wall-clock TTL on top of that. If I turn on debug for a user at midnight chasing a bug, I shouldn't wake up to nine hours of debug spew across whatever traffic they generated overnight. A finite TTL on the dial entry handles cleanup so I don't have to remember.

Filterable in the aggregator. I want a query like user_debug=12345 (or whatever syntax your stack uses) to surface exactly the lines from that user's debug session, none of the other users, none of the other request types.

Cooperative with last post's per-component dial. If MailerService is already dialed up via the class-level pattern, my per-user override should still be able to push it further for the one user. The two layers should compose, not collide.

Approaches I considered

Rails.logger.level = :debug for the fleet. It's the same sledgehammer I covered last post.

The per-component dial from last post. Closer. Dial PaymentsService to debug, get debug for that one class. But you also get debug for every user's calls into PaymentsService, which is the wrong shape when the bug is one user.

ActiveSupport::TaggedLogging. The default Rails answer for "scope a logger to a request." But it adds tags only, it doesn't change which lines emit¹. Wrapping a request in Rails.logger.tagged("user-12345") do ... end gives me a tag on every line that already would have emitted at the current global level. The debug lines I actually wanted still don't fire, because the level is still wherever the app's default sits.

Prefab's filter approach. Same as the previous post: solves it, SaaS dependency, plus the appender-filter shape composes oddly with the class-level dial. Both layers see every line independently and have to agree.

A custom logger subclass with a thread-variable level. Wrap Rails.logger in a subclass that consults a thread variable for the current minimum level. It's doable in maybe twenty lines of code. But it duplicates work that semantic_logger already does and adds a custom logger that future readers have to mentally model. I had a working semantic_logger pipeline. I didn't want to fork it for one feature.

SemanticLogger.silence in an around_action. Once the per-component dial from last post was clean, this turned out to be the right answer. silence(level) is a block-scoped function on a thread-local minimum level². It lowers the threshold for the duration of the block. The block is ensure-scoped, so nothing leaks if it raises. There's no per-line overhead either. And it cooperates cleanly with the per-component dial as long as you built the dial right. (See post 1's trap section.)

What ruled all of these in or out: the dial had to be per-user and request-scoped, had to filter cleanly in the aggregator, and had to coexist with the existing per-component pattern. silence plus tagged met all four.

What I built

A small module:

module RequestScope
  def self.apply(user_id, &)
    return yield if user_id.blank?

    override = Cache.level_for(user_id)
    return yield if override.nil?

    SemanticLogger.silence(override) do
      SemanticLogger.tagged(user_debug: user_id, &)
    end
  end
end

Wired at the application controller as an around_action:

class ApplicationController < ActionController::Base
  prepend ComponentLogger          # from the previous post
  around_action :apply_user_debug_logging

  def apply_user_debug_logging(&)
    RequestScope.apply(current_user&.id, &)
  end
end

Three lines of glue plus the module. All of the surface area lives here.

Inside apply:

Cache.level_for(user_id) checks an in-process map for an override. If nothing's set, the block runs unwrapped and the cost is one map lookup. No-op for ~99% of requests.

If an override exists, we wrap the block in two ensure-scoped layers.

SemanticLogger.silence(override) sets a thread-local minimum level for the duration of the block. While the block is running, any logger whose own @level is unset falls back to a check against this thread-local, and lines at the silenced level (or higher) emit. Lines below the silenced level still get short-circuited.

SemanticLogger.tagged(user_debug: user_id, &) attaches a named tag to every log entry inside. Both silence and tagged are ensure-scoped inside semantic_logger, so the thread-local state restores even if the block raises. There's no risk of a per-user dial leaking onto the next request that lands on the same worker thread.

In the aggregator, this shows up as a structured tag on every line emitted during that user's request. Whatever query language your aggregator speaks, you can pin a query to just that user's session and exclude everything else: other users' requests, health checks, unrelated background traffic.

The whole shape, drawn end-to-end:

flowchart TD
    A[request from user 12345] --> B[around_action: apply_user_debug_logging]
    B --> C[Cache.level_for user_id]
    C --> D{override<br/>in cache?}
    D -->|no| E[yield without wrapping]
    D -->|yes| F["SemanticLogger.silence(level)"]
    F --> G["SemanticLogger.tagged(user_debug: 12345)"]
    G --> H[block runs through<br/>controller, services, jobs]
    H --> I[each log line emits at silenced level<br/>and carries the user_debug tag]
    I --> J[unwind: silence and tag cleared]
    E --> K[response]
    J --> K

The store and the cache

Cache.level_for(user_id) reads from a process-local concurrent map. The map gets refreshed when the per-user store changes, which keeps the per-request lookup at memory speed.

The store under the cache is a single Redis hash, keyed user_debug_logging, with each field keyed by user_id and the value encoding the override. I went with a hash rather than separate keys per user (user_debug:12345, user_debug:67890, and so on) for two reasons. The first is enumeration: pulling every active override is one HGETALL instead of a SCAN across the keyspace, which matters when an operator wants to list everyone currently flagged or clear them in a hurry. The second is locality: every override sits under the same key, so dumping the full state, counting active overrides, or wiping the whole thing is a one-line operation. The separate-key shape scatters the overrides across the keyspace and forces a SCAN every time.

The catch with hashes is that Redis doesn't expose per-field TTL. With separate keys I could call SETEX user_debug:12345 1800 :debug and Redis would evict the entry after thirty minutes on its own. Hashes only have TTL on the whole hash, not on individual fields. So expiry gets enforced in code: each value encodes level:expires_at, and on read the code compares expires_at to the wall clock. If the timestamp is in the past, the entry is treated as missing and lazily deleted from the hash. On the broadcast-driven refresh, the cache loader also sweeps any expired fields it finds while it's already iterating, which keeps the hash bounded long-term. The pattern is lazy + sweep rather than set TTL and forget, but the operational result is the same: dead entries don't pile up.

The cross-pod invalidation question, how does pod B find out that pod A wrote a new override, gets handled by a broadcast layer on top of the same store. That's a separate post about the broadcast pattern itself. For this post, "the cache is fresh within a second of any write" is enough.

The cooperation with the per-component dial

SemanticLogger.silence only takes effect when the per-class logger's @level is unset. The per-component dial from the previous post has a deliberate @level = nil branch: when the config store says "default level for this class," it clears @level rather than leaving it pinned at whatever the last write was. That cleared state is what lets the thread-local silence kick in for that class.

Without that branch in post 1, the per-user override would silently fail for any class that had ever called logger after boot. The class would have @level pinned to whatever the app's default is (info, warn, depending on your config), the silence would be ignored, and the debug lines wouldn't emit for that class even with the user-level dial set. The investigation would seem fine. The override would be in the store. The around_action would be firing. The tag would be on the lines that did emit. But the lines you cared about wouldn't be there.

The two features cooperate because the per-component dial deliberately leaves room for the per-user one to take over. Building post 1 without thinking about post 2 would have made post 2 silently broken. The fix in post 1 takes one line and one comment, but it carries the weight of this whole pattern.

What this gets you

A working dial for "log everything for this one user, for the next 30 minutes." Write the override into the config store from a Rails console, an internal admin endpoint, or whatever surface you want to expose: user_id: 12345, level: :debug, expires_at: 30.minutes.from_now. The next request from that user hits the around_action, picks up the override, and emits debug logs for the whole request lifecycle. The aggregator receives every line tagged with a user_debug field set to that user's id. Filter on the tag, see only that user's session.

After 30 minutes the entry expires. The next read returns nil. The around_action becomes a no-op for that user. There's no cleanup to do, no follow-up to remember.

The cooperation with the per-component dial means I can also turn on debug for a class and a user at the same time, and they layer cleanly. Class is dialed up one notch via the per-component pattern, user gets silence(:debug) for their request, and that user's lines come out at debug while everyone else's lines come out at the class's dialed level. The two-axis grid works.

What it doesn't

Async jobs need a small wiring step. Thread-locals don't auto-propagate across the ActiveJob boundary, so a job enqueued during user 12345's request runs on a different thread (often on a different process) where silence is unset. Good news: this comes up often enough that the patterns are well-worn. Three standard fixes:

Pass the user id explicitly as a job arg. The most common approach. The job has user_id in its arguments and calls RequestScope.apply(user_id, &) itself, mirroring the controller's around_action shape. Bake the call into ApplicationJob#perform with a prepend'd concern and every subclass inherits.
Use ActiveSupport::CurrentAttributes. Rails ships this for ambient request-scoped state, and the built-in ActiveJob serializer auto-propagates registered attributes across perform_later³. Stash the override id on a Current.user_id attribute and the job sees it for free, no per-job wiring.
Custom job-backend middleware. Sidekiq, GoodJob, and SolidQueue all have client/server middleware hooks for capturing context on enqueue and restoring it on perform. Sidekiq's own tagged-logging context is solid prior art for this shape.

I went with (1) because the jobs in our app already pass user_id for unrelated reasons. (2) is probably cleaner if you're starting fresh, since the auto-propagation removes a class of "I forgot to thread the override into this job" bugs.

Non-controller paths need their own wiring. The hook lives in a controller around_action, so anywhere the request boundary isn't a controller (cron triggers, message-broker callbacks, internal pollers, signal handlers), the around_action doesn't run. The fix has the same shape: call RequestScope.apply at the entry point of whatever the path is. Each entry point has to be wired explicitly. The pattern doesn't propagate by itself.

Class-level dials still win when both are set. If a class has been pinned to a non-default level via the per-component dial, and I want this user at a finer level than that pin, the per-user override loses for that class. The class's @level is pinned, so silence doesn't apply. In practice this comes up rarely because class-level dials are usually transient and short-lived, but it is the shape worth knowing about. The two layers compose for "user-level debug while class is at the app default" but not for "user-level debug while class has its own non-default pin."

The aggregator still has to support tag filtering for this to be useful at all. If you're on a logging stack that doesn't index custom tags, the user_debug filter does nothing for you. Most modern aggregators index custom tags by default. Some hosted budget-tier loggers don't. It's worth checking before you wire this up.

Closer

If you've built per-user logging in Rails before, especially in a multi-tenant or per-customer setup, I'd love to compare notes. The async-job propagation problem in particular has a few ways to solve it (concern on ApplicationJob, explicit pass-through in perform args, ActiveSupport::CurrentAttributes) and I'm curious which one held up for you. Email's on the contact page. The closer to "I tried this and here's the constraint that bit me," the better.

The pattern from this post sits on top of the per-component dial from the previous one. The two together cover most of the "turn on debug logging for X" cases that come up in production: X is a class, X is a user, and (with both at once) X is a class for one user.

Rails API, ActiveSupport::TaggedLogging. Wraps a block to attach a tag to every log entry inside. The level itself is not changed. ↩
Reid Morrison, Semantic Logger docs. SemanticLogger.silence reduces the minimum level for the duration of a block on the calling thread. Ensure-scoped, so the thread-local state restores even if the block raises. Per the docs, the suppression only applies to loggers that are using the global default level. If a logger has an explicit @level pinned, silence is ignored for that logger. This is the property the per-component pattern from the previous post leaves intact on purpose. ↩
Rails API, ActiveSupport::CurrentAttributes. Thread-isolated, request-scoped attributes that auto-reset between requests. ActiveJob has built-in support for serializing registered Current attributes across perform_later, so ambient context follows the job to the worker without per-job wiring. ↩