Custom String Interpolation and Compile-time Interpretation applied to Logging

Hi all,

I, along with @Ben_Cohen, @moiseev and @Devin_Coughlin, have been working on improving Swift APIs for Apple's unified logging system, based on some recent exciting developments to the Swift language and compiler. This post presents an overview of our plan for creating a customizable interface for Apple's logging system. Our proposal uses custom string interpolation, which is a recently-improved language feature, and compile-time interpretation, which is a compiler technology that is in the process of being upstreamed as an experimental feature.

We are very excited to discuss our proposal and welcome your feedback on this. However, the main aim of this post is not to focus too much on Apple specific logging APIs, but to talk about the language features and underlying ideas that enable these improvements to this logging API. Specifically, we think the following aspects of the proposal are applicable more generally:

  • We show how the new string interpolation features can be used to surface all functionalities provided by printf-style format string and varargs.
  • We present an extensible Swift implementation that deconstructs a string interpolation into a format string and an argument sequence.
  • We present a compiler optimization based on compile-time interpretation that can infer the format string and other compile-time constants from the implementation that deconstructs the string interpolation.

We therefore see this proposal as being relevant to the following potential future directions:

  • Development of a general-purpose, platform-agnostic logging API for Swift.
  • Development of custom logging libraries for interfacing with other logging systems.
  • Migration of other C-style Swift APIs such as localization APIs to string interpolation.
  • Development of the compile-time interpretation technology and making it available to Swift users.

(A more detailed version of this proposal is available here).

Highlights of this Proposal

  • Changes os_log API to accept string interpolations instead of the printf-style format string and varargs combination that it currently accepts.
    The new API (written in camelCase) will expose all the formatting options that are supported by the logging system such as tagging data as private, specifying the precision of floating-point numbers etc.

  • Presents a Swift implementation of the osLog API based on custom string interpolation.
    The implementation will interface with the native C ABIs provided by the logging system, which expect a printf-style format string and a contiguous sequence of bytes containing the varargs. The implementation constructs the required inputs to the C ABIs from the string interpolation that is passed in, thereby relieving the users of osLog from these low-level considerations. All of this construction logic will be implemented in Swift. Some general, reusable components of the implementation could be made available in the standard library for use by other similar use case.

  • Enables Swift users to extend the osLog API to log their custom types.
    Swift users can define extensions to the new logging API. The extensions may accept values of user-defined types and may define how to render the type and which fields are private etc. Instances of user-defined types can be interpolated in the log calls as if they were natively-supported values: osLog("User-defined type: \(instance)").

  • Makes osLog invocations fast and aims to match the performance of os_log calls in C programs compiled with Clang.
    This is achieved by an optimization pass that uses compile-time interpretation. Using such a general-purpose framework reduces the need for hardcoded knowledge within the optimization passes. This makes it easier for the implementation to evolve, and allows the extensions to the logging API to benefit from the optimizations. However, the use of the interpreter means that implementing the logging API requires more sophistication. It requires separating out the implementation into a compile-time interpretable part and a runtime part. Nonetheless, this cognitive overhead is hidden from the users and is a concern only for the implementers/extenders of the logging API.

Proposal

The New User Interface for Logging

The following table shows some examples of osLog calls with formatting options in the proposed syntax.
An equivalent call in the existing os_log syntax is also shown for comparison. The new custom string interpolation feature allows interpolated terms to be sequences of labeled expressions (similar to parameters of functions). We propose to use these features to specify formatting options with the interpolated expressions.

Proposed Syntax Existing syntax
osLog("Network delay: \(delay) ms") os_log("Network delay: %lf ms", delay)
osLog("Network delay: \(delay, .decimal(2)) ms") os_log("Network delay: %.2lf ms", delay)
osLog("Network delay: \(private: delay, .decimal(2)) ms") os_log("Network delay: %{private}.2lf ms", delay)
osLog("Login time: \(private: currentTime, .time) ms") os_log("Login time: %{private, time_t}d ms", currentTime)
osLog("My user-defined type: \(myinstance, .myoption(myparam))") -

The formatting options are cases of enums. For standard types the enums could be defined along with the definition of the osLog API. (For user-defined types, it can be defined by the user creating the logging extension for the type.) E.g., the enum sketched below lists (some of) the formatting options for an Int type supported by osLog.

public enum IntLogFormat {
  case hex
  case octal
  case time
  case ipAddress
  ...
}

The formatting options that can be paired with a type is fixed by the definition of the interpolation functions for the type. This provides more type safety than the existing APIs, where formatting options such as time_t are specified as strings. The interpolated values can also be user-defined types and can also accept user-defined options much like the natively defined types, provided the logging API has been appropriately extended to handle the user-defined type.

Formatting options are required to be static i.e., compile-time known values

An important requirement of the API is that the formatting options must be compile-time constants. If the enum cases accept parameters (e.g. like .decimal(2)), those parameters should also be compile-time constants. This is necessary as the logging system requires the format string to be a static string in the compiled binary. (This is essential for performance as well as for enforcing a privacy policy.) We propose to generate a compiler error if this requirement is violated.

Logging API Implementation Model

We propose to implement osLog as a Swift program and make it extensible by Swift users. The details described here would be relevant for users interested in creating such extensions.

As a running example, consider a call: osLog("Login time: \(private: currentTime, .time)"). The implementation described below will construct a format string of the form: "Login time: %{private,time_t}ld" and a buffer consisting of the bytes of the interpolated expression: currentTime, and will invoke the C logging ABIs with these values.

We define osLog as a function that accepts a struct: PackedLogMsg, which conforms to the string interpolation protocol: ExpressibleByStringInterpolation protocol. When a string interpolation is passed to the osLog function, the compiler would generate code to create an instance of PackedLogMsg
and invoke a sequence of methods on it passing the literal and interpolated parts of the string interpolation.
For example, the sequence of calls generated by the compiler for the call osLog("Login time: \(private: currentTime, .time)") is similar to the code shown below:

var stringInterpol = PackedLogMsg.StringInterpolation(literalCapacity: 12, interpolationCount: 1)
stringInterpol.appendLiteral("Login time: ")
stringInterpol.appendInterpolation(private: currentTime, .time)
var packedMsg = PackedLogMsg(stringInterpolation: stringInterpol)
osLog(packedMsg)

The construction of the format string and byte buffer happens in the string interpolation protocol methods of PackedLogMsg. The following code snippet sketches the implementation of these functions. The code snippet has some missing pieces which are elided for brevity. The complete implementation of PackedLogMsg and osLog can be found here: PackedLogMsg prototype.

A sketch of the implementation of osLog and PackedLogMsg
public func osLog(_ packedMsg: PackedLogMsg) {
  // Check if logging is enabled.
  guard logger.isEnabled(...) else { return }

  // Allocate a raw byte buffer and a byte encoder.
  let byteCount = packedMsg.totalArgByteCount
  var bufferPtr = UnsafeMutablePointer<UInt8>.allocate(capacity: byteCount)
  var byteEncoder = ByteEncoder(bufferPtr)

  // Move the data tracked by 'packedMsg' into the byte buffer.
  packedMsg.encode(&byteEncoder)

  // Invoke the C logging ABI.
  _os_log_impl_wrapper(packedMsg.formatString, bufferPtr, byteCount, ...)

  // Clean up all state.
  bufferPtr.deallocate()
}

public struct PackedLogMsg : ExpressibleByStringInterpolation {

  let stringInterpol: StringInterpolation

  public struct StringInterpolation : StringInterpolationProtocol {
    var preamble: UInt8 // The first byte of the header.
    var argCount: UInt8 // Number of arguments in the buffer.
    var formatString: String
    var encodeOps: EncodeOperations

    public mutating func appendLiteral(_ literal: String) {
      formatString += literal  // Escaping of percent characters is elided for brevity.
    }

    public mutating func appendInterpolation(private number: @autoclosure () -> Int64, _ format: IntLogFormat) {
      preamble |= 1 // Set the private argument bit in the preamble.
      argCount += 1
      formatString += "%{private"
      switch format {
      case .time:
        formatString += ",time_t"
      case .ipAddress:
        formatString += ",network:in_addr"
      ...
      default:
        break
      }
      formatString += "}ld"
      encodeOps.append(UInt8(1)) // Append a preamble for the argument.
      encodeOps.append(UInt8(8)) // Append the byte size of the argument.
      encodeOps.append(number)   // Append the argument itself.
    }
  }

  public init(stringInterpolation: StringInterpolation) {
    stringInterpol = stringInterpolation
  }
}

The implementation is also compile-time interpretable, i.e., it can be interpreted (by a powerful enough interpreter) to infer the the value of the format string at compile time. The use of EncodeOperations to abstract buffer construction and "autoclosuring" of parameters in the above code snippet are essential design choices that make the implementation easier to interpret. More details are available here.

Enabling User-defined Types to be Logged

The PackedLogMsg.StringInterpolation struct can be extended by Swift users in order to log user-defined types. To log a user-defined type it suffices to define an overload of appendInterpolation on PackedLogMsg.StringInterpolation that accepts the user-defined type and interpolates the properties of the type as desired. The following code snippet illustrates this on a struct Developer that uses an enum Language. With this implementation, one can directly log instances of Developer e.g. osLog("Developer \(dev) made a commit").

Note that all that the extension has to do is to invoke appendInterpolation methods on types already
supported by PackedLogMsg. In fact, as illustrated by appendInterpolation(_ dev: Developer), this can be accomplished by recursively using the string interpolation syntax itself, once we have defined a appendInterpolation overload on PackedLogMsg that can accept instances of itself. (See here for the full implementation.)

Extending PackedLogMsg to log a user-defined struct Developer and an enum Language
struct Developer {
  let id: Int
  var name: String
  var preferredLanguage: Language
}

enum Language {
  case swift
  case objectiveC
  case cpp
}

extension PackedLogMsg.StringInterpolation {
  // Ideally, the parameters should be made autoclosures to enable compile-time interpretation.
  mutating func appendInterpolation(_ dev: Developer) {
    let msg: PackLogMsg = "Id: \(private: dev.id) Name: \(private: dev.name) Preferred Language: \(dev.preferredLanguage)"
    appendInterpolation(msg) // Combines self with msg.stringInterpol.
  }

  mutating func appendInterpolation(_ lang: Language) {
    switch lang {
    case .swift:
      appendInterpolation("swift")
    case .objectiveC:
      appendInterpolation("objectiveC")
    case .cpp:
      appendInterpolation("C++")
    }
  }
}

Optimization

The overall optimization workflow is summarized below. We propose to create a dedicated optimization pass that drives the following steps. The goal of this optimization is to match the performance of the code generated by Clang for equivalent os_log calls in C.

  1. The compile-time interpreter is run on the string interpolation calls to infer the compile-time value of the struct packedMsg and its properties.

  2. The implementation of osLog is optimized by folding in the uses of the properties of packedMsg by their constant values. This involves replacing accesses to formatString property with the inferred string literal, replacing totalArgsByteCount with the inferred constant, and replacing all indirect invocation of the closures tracked by packedMsg.encodeOps with their inferred targets. (The function packedMsg.encode(&byteEncoder) invokes the closures tracked by packedMsg.encodeOps on byteEncoder.)

  3. All invocations of autoclosures that were created from the arguments to the string interpolation methods are converted to direct calls based on the targets inferred by the compile-time interpreter.

  4. Existing function inlining pass is used to inline all direct function calls to ByteEncoder methods and closure targets.

  5. Existing dead-code and dead-object elimination passes are used to eliminate the packedMsg instance, which is now unused as its uses have been folded, and also to eliminate the string interpolation calls.

  6. The heap-allocation of the byte buffer in the osLog function is promoted to stack, based on the fact that it is deallocated at the end of the function.

If any of the above steps is not possible, we propose to present an error or warning to the user (or implementer of the logging API) depending on whether it results in a violation of the contract of the logging system or a performance degradation. E.g., if the format string cannot be inferred as a compile-time constant, it's a contract violation and is therefore an error. However, if step 6 is not possible, it is a loss of performance but is not an error.

Compile-Time Interpreter - Upstreaming Plans and Extensions

The compile-time interpreter is available in the Swift tensorflow branch. The interpreter is being upstreamed as an experimental compiler feature as per the plan laid out here: Compile-time interpretation by marcrasi · Pull Request #19579 · apple/swift · GitHub.
We will use our real-world experience of it with osLog to help inform a future language evolution proposal for the compile-time interpreter and for any user-visible feature it enables.

The following is a list of language features that the interpreter would have to support in order for it to be applicable to the implementation sketched above.

  • Basic language constructs such as: integers, floats, structures, function calls and generics.
  • Array literal construction and array operations such as array indexing, array append.
  • String literal construction and string operations such as +, += etc.
  • Closure creation, closure assignment and invocation.

The interpreter already supports most of the features listed above, except for closures. We propose to create a new representation within the interpreter for closures, which consists of the target of the closure (which is a SIL function) and variables that are captured. The closures would be treated as another kind of symbolic constant within the interpreter. The closures that are not invoked during interpretation are allowed to use arbitrary Swift code. The targets of the closures that are invoked during interpretation must be interpretable and all the captures of the closure must be constants tracked by the interpreter.

ABI Stability and Backward Deployment

The existing os_log API will be supported to ensure ABI stability. More interestingly, the optimizations proposed here enable backward deployment of code compiled with the new osLog API to older, ABI-stable Swift by eliminating abstractions such as PackedLogMsg from the compiled code.

37 Likes

This looks really great! It’s nice to see such orthogonal features come together so powerfully to improve an API. As a user, I do have a few concerns and questions.

  1. The naming doesn’t seem quite Swifty. It seems like osLog should be OSLog, and some of the other API be unabbreviated. e.g. msg = message.
  2. Will the os_log API remain visible at the same time as the new API? It would be great if we could separate the two to minimize confusion.
  3. Will these improvements be applied to the other os_ APIs? It would be awkward to try to mix this nice new API with os_signpost, for example.

Otherwise I really look forward to this.

9 Likes

I second the first point, PackedLogMsg feels foreign. Though, osLog does make sense given the spelling of what it’s intended to replace.

In the Developer example, should preferredLanguage: String be preferredLanguage: Language?

Great work, all.

2 Likes

This is super cool. I'm glad the constexpr stuff is useful to you, it would be great for it to support closures!

-Chris

14 Likes

Aside from the proposal, would it also be possible to expose the function/line/file parameters, because that data is often lost since using os_log directly everywhere without any wrapper may be impossible or simply adds a lot of boilerplate code.

Another improvement for the logging system would be to allow to mutate static strings with other static strings. The data is obtained statically so it should be safe to mutate with other static string literals.

5 Likes

Is there any consideration about such static formatting for building normal string?
I think this proposal is good.
But personally I want a new static alternative for Foundation.String(format:) rather than for some logging.

4 Likes

This is fantastic. I agree with @Jon_Shier that Msg feels foreign in Swift. I also wonder why the main function is named osLog() instead of simply log().

4 Likes

Thanks a lot for addressing this.

Not sure if my suggestion is related, but would it be possible to write code below if this proposal is implemented:

extension OSLog {
  static let sidebarActions: OSLog =
    .init(subsystem: "com.acme.app.Actions", category: "Sidebar")
  func info(...) {
    ...        // Custom logger e.g. Analytics
    osLog(...) // Unified logger
  }
}
OSLog.sidebarActions.info("Select \(selectedItem)")

Right now we can’t wrap os_log without losing native #function, #file and #line in the unified logging system.

That is, wrapping is a very important but now missed functionality. Is it real to address it here as well? Thanks in advance!

1 Like

The same general technique should be applicable to other format string based APIs. You could take something like format-string-interpolation.swift · GitHub and develop it a bit further to take advantage of the compile time evaluation features Ravi laid out. It’d be interesting to see a Swift-specific format string implementation happen; it could be a more efficient and flexible alternative to the existing print function, which always prerenders the string.

8 Likes

I'm confused as to why this proposal is coming up.

The proposal itself is fine, but we're completely ignoring the elephant in the room: developers have zero incentive to use os_log.

As an app developer, why would I want to use a logging system where I can't retrieve the logs? The whole point of logging is so that I can programmatically gather information when something goes wrong, but os_log doesn't have a way to gather any information! (and before you chime in saying "just ask your users for a sysdiagnose", think about the steps involved in getting someone's aging parent to perform a sysdiagnose on their phone. That suggestion is so absurdly out-of-touch I have a hard time believing it's ever seriously suggested)

The point of logging is to get debuggable, actionable information in a scenario where a debugger is not available.

So if I can't get that log information from os_log, why are we talking about this at all?


(edited to add): I'm talking about the context of "I've shipped an app to customers and they're trying to request help via a support feature in the app" ← I can't programmatically ask for logs to attach to that support request. os_log stuff only ever goes to Apple (functionally) and is therefore worthless to me as a person who ships apps to customers.

24 Likes

I'm hugely excited about this! Thank you to everyone involved!


Though it seems more we're in the "what-kind-of-wood-goes-into-the-bikeshed" stage than a "what-color-should-the-bikeshed-be" stage, I would be remiss if I didn't say this would be a prime opportunity to make use of the OSLog namespace that's exported by OS_OBJECT_CLASS. OSLog.Message and making instance and static methods instead of the os_log free function would be more fluent and Swifty. Since the original printf-style method will still exist, this would make it a more clear migration point for older code. As others have mentioned, semantic methods like info(…), error(…) would be ideal and match perfectly with logging systems from other platforms.


I'm excited about customizing the log representation for a type. I will paraphrase something I said in the SE-0228 thread, though, and say that it worries me for discoverability that the official way to make this happen is to extend some arbitrary type, where elsewhere in the language there's syntax for it (conforming to a protocol).

6 Likes

Hi everyone – it might help guide the discussion to be explicit that the goal of this thread isn't to discuss os_log or how it might evolve or change.

This is a post to the compiler development forum, and the idea is to open up discussion of the design and implementation of new language features rather than a specific logging SDK. The logging use case is motivating some of us working on the language features, but there are also other use cases. We also would want different logging APIs to be able to evolve in similar ways, as well as use cases other than logging (for example, like compile-time literal validation).

The idea is to focus on whether the language feature allows for that flexibility, rather than to discuss if os_log itself should work one way or another.

9 Likes

Would this optimization also happen if using a defer statement (immediately after the allocation)?

var bufferPtr = UnsafeMutablePointer<UInt8>.allocate(capacity: byteCount)
defer { bufferPtr.deallocate() }

In principle, yes, the compiler ought to be able promote any statically balanced (and reasonably-sized) allocate/deallocate pair to a stack allocation.

3 Likes

Sorry, but the “Improving OS Log” part in the topic inspires us to think about a ‘fixed’ os_log. As this function is a part of the Standard Library, it seems like you will modify it anyway :innocent: Thanks.

1 Like

Yeah, sorry, that is misleading and that's why I posted to try and redirect. Maybe we can retitle the thread. The goal here is to talk about the design of the language feature.

os_log is not part of the standard library.

edit: to flesh this out a bit – some component parts of what is needed for a logging implementation should probably make it into the standard library, but how Apple's specific technology for logging on devices etc would be at a level above this. For example, there are also discussions going on in the server work group about logging and we would ideally share some stuff there. But the goal of this thread is to talk about what language features we need in order to start building up that stack.

4 Likes
  1. The naming doesn’t seem quite Swifty. It seems like osLog should be OSLog, and some of the other API be unabbreviated. e.g. msg = message.

Thanks for pointing that out and we will evolve the naming conventions and syntax to be more Swifty. Just to clarify: osLog is a free function as described here. It is not a type. However, it could be made a method or, in the future, could be abstracted by a more general logging API.

  1. Will the os_log API remain visible at the same time as the new API? It would be great if we could separate the two to minimize confusion.

That is a good suggestion. While os_log should be a part of the ABI, it certainly could be made unavailable or, at least, discouraged when the new APIs become available.

  1. Will these improvements be applied to the other os_ APIs?

Yes.

1 Like

I didn't realize that the "constexpr" support would be so powerful so soon. This makes it seem like it'd be possible to implement a fmt style printing library in Swift 5 (or whichever version is getting compile time evaluation), which is exciting. String interpolation gets a little awkward when you want to print formatted numbers. Right now the nicest way is to use String(format:) or vprintf().

Even without the constant evaluation part, it is possible now to extend String.Interpolation with APIs for different kinds of formatting in interpolation segments. @Erica_Sadun had some good examples in a recent blog post: The Beauty of Swift 5 String Interpolation — Erica Sadun Along the same lines of what she shows for Optional, you could add appendInterpolation(radix:precision:width:) for printf-like control of numeric formatting too.

2 Likes

I'm confused as to why this proposal is coming up.

This proposal is posted here for the following reasons:

(a) We think the ideas and the implementation presented here could be useful in many other contexts. E.g. one of the limitations that prevented some libraries from adopting string interpolation was that they wanted the ability to split the interpolation into static and dynamic components, and compile/optimize them differently. In fact, the design of the new custom string interpolation protocol was partly motivated by such use cases. What this proposal shows is that custom string interpolation can be combined with compile-time interpretation feature to obtain all benefits of a static format string and a varargs pair.

(b) Though os_log is not a general-purpose logging system, it is a highly sophisticated logging system, with asynchronous message construction, privacy qualifiers and custom formatting. Any logging system that shares some of these features would likely face similar challenges. Ideally, we would like to enable developers implementing other logging system also benefit from some of the abstractions and optimizations we are designing for this work. This is one of the motivations for using a general-purpose compile-time interpreter, and also for aspiring to make the implementation of types like PackedLogMsg visible and available to everyone.
(See also @Ben_Cohen's comments in this thread.)

(c) Finally, we really want to share with the community the new powerful, compiler features that we plan to bring in and provide some context and motivation for that.

I'm talking about the context of "I've shipped an app to customers and they're trying to request help via a support feature in the app" ← I can't programmatically ask for logs to attach to that support request.

I will try redirect this concern to the right team.

8 Likes