[Proposal] SLG-0004: Metadata values privacy attribute

Hi all,

The proposal SLG-0004: Metadata values privacy attribute for swift-log is now up and In Review.

The review period will run until March 17th — please feel free to post your feedback as a reply to this thread.

Thanks!

-1 on the idea that the privacy can be decided where the log message is emitted. A library can in most cases not know.

Only the application has understanding what is private and what is not.

A URL for example can be private or public, depending on the use case. Same for pretty much anything else that's a string.

Therefore, a pretty strong -1 on this proposal. I think we need to think about this more. The proposed solution is IMHO worse than what can already achieved with SwiftLog today, namely:

  • treat all metadata values as private by default
  • take configuration at the application level (configuring the backend) which allows (log message, source, metadata key) pairs that can be assumed public.

There might be edge cases where a library can decide the privacy, but in libraries that's not very common in my experience.

To me, the key question we have to answer is: Can an application configure the privacy level of a particular value in any library WITHOUT code changes in the library. If the answer is no, then I don't think we have an acceptable solution.

5 Likes
/// - **No nested privacy**: When marking a dictionary or array as private, all contained
///   values are treated with the same privacy level. Fine-grained privacy within nested
///   structures is not currently supported.

I'm curious to hear the motivation behind only allowing attributes on the root level. I get that it's much simpler this way, especially with converting to and from "old" Metadata, and nested privacy might not be really intuitive (should only the leaf privacy matter, or should a .private attribute at any level mask anything below it?). But it's also a bit limiting for code that actively uses nested metadata.

// Imagine a user's ID is not sensitive, but their name is
// Below not allowed
let metadata: AttributedMetadata = ["user": ["id": "\(id)", "name": "\(name, privacy: .private)"]]
// Forced to use flat structure
let metadata: AttributedMetadata = ["user.id": "\(id)", "user.name": "\(name, privacy: .private)"] 

It might be that the code that extracts the metadata, does not know if it will in turn be nested under some other key. For example, metadata for a tls certificate could be nested under client or server key.

Or a slightly contrived example using .array metadata.

// Below not allowed
let metadata: AttributedMetadata = ["related.hosts": ["\(nonSensitiveHost)", "\(sensitiveHost, privacy: .private)"]]
// Must either mark all as sensitive, or split into two keys

Especially since the proposal mentions possible extensions with new attributes, it would be nice if the proposal talks a bit about why this choice was made, and if and how the implementation could be changed to support nested attributes in the future.

1 Like

I have mixed opinions on this, but the big point that came to mind is this seems not general enough of a system. I almost think that metadata attributes should be like Tags in SwiftTesting.

Define the protocol, and let LogHandler libraries define the attributes they recognize and handle, telling their users to annotate their logs with those attributes for them to respond to.

Perhaps SwiftLog could define a global “public” and “private” definition of that attribute…

// SwiftLog
struct AttributedMetadata.Attribute {
  let id: String
}

// Some Library
extension AttributedMetadata.Attribute {
  @Attribute static var someLibraryAttribute
}
logger.error("", attributedMetadata: [
  "some_key": "\(myValue, attributes: .someLibraryAttribute)"
])

// user app
import SomeLogHandlerLibrary
import SomeLibrary

SomeLogHandler.setAttributePolicy([
  .someLibraryAttribute: .redact
])

How would this work for transitive libraries you use in an app, which you don't directly interact with? How would their policies behave?

I agree with your general feedback that this is mostly for applications to decide; however, there are clear use-cases where a certain metadata value is sensitive, e.g. a password. In such cases, even libraries can confidently mark such values as private. Now, to your question here, this proposal does not change the fact that applications can decide if something is private or public. I would argue it makes it easier even. Similar to your approach for redacting values in a log handler, an application can install a log handler that changes the privacy level of a value. This does not require any changes in the library. The reason why I am saying it makes it even easier is that this log handler now only needs to change from private to public (or the other way around) and can then forward the log event to the "real" log handler, which should respect the privacy level.

Taking a step back, I see that this is just one piece of the puzzle for redacting sensitive metadata. This allows libraries to annotate values that they are certain about, and it gives us a shared "currency" type for privacy levels. However, I agree that often this becomes an application-level configuration. The current approach of applications installing log handlers that filter metadata keys is incredibly brittle. Any library is free to change their keys since they are not part of the public API. Worse, even getting to the string value of those keys requires searching through the library. I personally would be interested if we could find a way for libraries to define their metadata keys as part of the public API so that we can use them to implement exhaustive and compile-time safe redaction. Having said that, I think that can be done in a future proposal.

2 Likes

@johannesweiss

The proposed solution is IMHO worse than what can already achieved with SwiftLog today

Hear me out. These two are different features. The application (or even out of application!) key-based policy is a security feature saying "I don't trust any logs emitted in this app, here is the allowed list of things to be logged". It is a straightforward functionality many users use (it is even added to the list of alternatives in the proposal). The proposed privacy levels attached to values solve a different problem — "Sometimes I want to look at some extra values in logs, which is not covered by just log levels, but privacy concerns, and I trust my LogHandler to handle then correctly". One is a security feature another one is the security-aware maintenance feature.

That's a really weird one... the way it defaults to public I can't really think of this as a security feature, and I don't think it actually is really intended to be one. But all the wording it uses including "privacy" are really invoking thoughts of actually sensitive data, and if we're at risk of logging those -- the default has to be private by default, but that's not what we're proposing (and it would be hard to adopt easily).

I'm trying to understand who this proposal is really for, because it's a bit too weak to really help auditing a codebase for accidental PII information leaks -- you'd want a protocol that enforces that only trusted types "ok to log" even compile when one tries to log them -- including the description redaction etc...

It kind of feels this would be entirely ignored by the vast majority of swift-log users and I'm not entirely clear how it aligns the API with OSLog as well because of the flipped default :thinking: So, what are we really solving here? From there, let's work back and maybe name this apropriately, because AFAICS this isn't a privacy/security feature but it kinda looks like it is which is problematic IMHO?

3 Likes

Was this in response to my comment?

I’m not sure I follow your question.

If the log handlers don’t allow a way to configure on a per-attribute value basis, or they don’t expose their attributes, then it would work as today.

In my proposal, attributes would tell log handlers how to handle the message. If they don’t recognize or respond to it - it continues to work as it does today.

TL;DR

+1 on the overall concept from me.

Why +1

My team is actively looking into replacing our native apps’ usage of OSLog due to difficulties with collecting logs from user devices in the wild. I’ve been thinking about just forking swift-log to get OSLog-like string interpolations but it would be great to have added to the library itself.

Naming

Coming from the Apple OS environment, the privacy terminology sounds fine to me but I can understand the concern around misbranding. Maybe using a term like “mask” would be better? I found this example from a community member quite nice.

LogHandler vs Callsite redaction of metadata

As to whether redaction should be a LogHandler concern or a callsite concern:

I like swift-log’s metadata system since it lets you add to a Logger’s metadata as your code gets access to more context-specific information. However, I also find the all-or-nothing redaction of metadata values to be quite restrictive. There is plenty of metadata in my teams apps that we want to be public because it makes it easier to identify issues from log files. Not all metadata needs to be redacted and being able to specify at the callsite allows for the granularity I feel I’m lacking.

In my reading of this and previous discussions on this topic, I’ve seen concerns about libraries not reasonably being able to know whether or not something should be redacted or not. I mostly agree with @johannesweiss ‘s sentiment here:

To me, the key question we have to answer is: Can an application configure the privacy level of a particular value in any library WITHOUT code changes in the library. If the answer is no, then I don't think we have an acceptable solution.

I say mostly because I’m not sure how important it is for library consumers to be able to unredact specific metadata values. What’s proposed in the pitch introduction feels sufficient for my use cases but I’m also not writing server side code or maintaining lots of libraries that use swift-log.

1 Like

I chatted to @VladimirMinenko offline a little bit more and I think there is something more general that's quite interesting: arbitrary attributes for log metadata. @Mordil seems to also has similar ideas and mentions SwiftTesting's tags.

If we can find a solution where SwiftLog itself only learns the abstract concept of attributes but not any concrete attributes, then I think that could be quite compelling.

The concrete attributes (see below for more context) that one might use, be it type information, privacy info, error field designation, system info, ... should however come from other packages. For example there could be a swift-log-privacy-levels package which ships some private/public/restricted attributes, there could be a swift-log-types package which ships string/number/bool attributes etc.

So a collaborating library would additionally depend on e.g. the swift-log-types & swift-log-privacy-levels packages to say (made up syntax)

logger.info(
    "loaded user information",
    metadata: [
        "attempt": "\(attempt, annotations: [.number /* from swift-log-types */, .public /* from swift-log-privacy-levels */])",
        "user-id": "\(userID, annotations: [.restricted /* from swift-log-privacy-levels */])",
        "user-full-name": "\(userFullName, annotations: [.private /* swift-log-privacy-levels */]),
        "container.cpu.usage": "\(..., annotations: [.systemMetric /* swift-log-system-metrics */]))
        "foo": "bar"
    }
)

A participating log handler would also depend on the same attribute-providing packages to learn about the attributes and treat them in special ways, ignoring attributes it doesn't understand.

And to be clear, I do think it would be useful to have this concept. Some examples that come to mind:

  • type information (Our deployments for example need to distinguish between strings and numbers because we encrypt all strings but leave the numbers alone)
  • privacy information
  • error/error type/other "special" things
  • system information if you use "Wide Events" (e.g. we add memory usage/CPU usage/other system metrics to every log event which allows you to see them over time)

In my own production use cases I actually have real interest in particularly type information (because we encrypt strings but leave integers), designated error fields (because we log them first and never truncate them away for the console output) and filtering out system information (because we use wide events and I don't always want to see the CPU/memory/... information with every log event).


To be clear, I have not thought about the API evolution on how to support this, what the best API is, how we can make this cheap enough such that this doesn't slow folks who aren't using attributes down.

3 Likes

I thought more about this approach, and while I think it is interesting, I am not convinced that it is going to work out for two reasons.

First, I don't see how we can keep the currently clear split between libraries, applications, and log handler implementations. Currently, we expect none of the three to know about each other. Libraries emit log events, applications configure the log handler in use, and log handlers process events however they like. Generalized attributes start to create a connection between log events and log handlers, where I fear that it will push libraries to depend on concrete log handlers so that the right attributes are added. I know that you proposed creating general packages such as swift-log-types and swift-log-privacy-labels but if these attributes are so general that we expect almost all code to add them to their log events then I think these separate packages just create a higher bar and will make it harder for us to deliver a consistent logging experience across the ecosystem. If there would be a hypothetical swift-some-specific-log-handler-attributes package, then we won't expect libraries to start adding those. My point is that if we want to keep the split between libraries, applications, and log handlers, then we need a fixed set of attributes that we expect libraries and applications to add to metadata values. If we have such a fixed set, then swift-log should declare those directly instead of going through an indirection. I don't think generalizing this idea will retain our current separation of concerns.

My second reason for why I think generalized metadata attributes are problematic is performance. If we have a potential unbounded list of attributes for a metadata value, then we have to allocate to store those attributes. We could create a specialized implementation that stores a few attributes inline and only allocates for N+ attributes. With a fixed set of attributes as proposed here, we can have a single struct that doesn't require any allocations.

To summarize my point, if we believe that metadata attributes are generally useful, then I think swift-log should provide an opinionated set of attributes that can be applied to metadata values. Furthermore, I believe that privacy levels here are one of the useful metadata attributes that we should add. There is both usage inside libraries and applications for them.

4 Likes

First, I don't see how we can keep the currently clear split between libraries, applications, and log handler implementations. Currently, we expect none of the three to know about each other. Libraries emit log events, applications configure the log handler in use, and log handlers process events however they like. Generalized attributes start to create a connection between log events and log handlers, where I fear that it will push libraries to depend on concrete log handlers so that the right attributes are added.

They should definitely not depend on the specific log handler. But yes, certain libraries would add attributes that only some log handlers make use of.

My second reason for why I think generalized metadata attributes are problematic is performance. If we have a potential unbounded list of attributes for a metadata value, then we have to allocate to store those attributes. We could create a specialized implementation that stores a few attributes inline and only allocates for N+ attributes. With a fixed set of attributes as proposed here, we can have a single struct that doesn't require any allocations.

I would recommend deciding on what exactly the feature is and if it's really worthwhile. If we determine that we really need attributes, we can discuss fast implementations for it.

One idea would be to store the attributes as bits. We would need to devise some registration mechanism (ideally but not necessarily at compile time) where swift-log would assign each attribute a fixed bit. This then allows us to store it cheaply alongside the metadata key/value. We could make up to 64 registered attributes cheap by storing a single Int64 inline (or of course use any other fixed quantity we choose). That of course leaves the question of what to do if the package graph needs more than 64 (or whatever we choose) registered attributes. We could reject attribute registration or start allocating. But let's not get too much into the details here.

if we believe that metadata attributes are generally useful, then I think swift-log should provide an opinionated set of attributes that can be applied to metadata values.

You mean essentially we'd specify a fixed OptionSet of privacyPrivate, privacyPublic, privacyRestricted, systemMetric, typeInt, typeFloat, typeString, typeBool, typeURL, ... in swift-log itself instead of making this extensible by others? It would of course make the "registration aspect" easy because there's a fixed list. I don't love it, I don't hate it.

I believe there are two features. The first one is attributes important/general enough to be part of the swift-log interface. It makes swift-log API universal for all LogHandler and Logger users. Only these attributes can be used in libraries.

A different feature is LogHandler+Application pair-specific attributes. One way of representing those could be as custom attributed:

public struct MetadataValueAttributes: Sendable, Hashable, CustomStringConvertible {
    /// The privacy level of this metadata value.
    public var privacy: PrivacyLevel
    
    /// Application-specific attributes
    public var customAttributes: [String: String] = [:]

    /// Create metadata value attributes with the specified privacy level.
    ///
    /// - Parameter privacy: The privacy level for this metadata. Defaults to `.public`.
    public init(privacy: PrivacyLevel = .public, customAttributes: [String: String] = [:]) {
        self.privacy = privacy
        self.customAttributes = customAttributes
    }
}

Correct. I think we have somewhat reached consensus here that there is a value in additional attributes for metadata values with privacy levels being one of them. I think fully generalized metadata attributes are interesting but bring a lot of complexity. I think we should start with a small and strongly opinionated set from swift-log. In the future, we can explore extending this to customizable attributes, but I remain skeptical that this won't lead to a tighter coupling of libraries and log handlers.

3 Likes