Pitch: Debug Description macro

Dave_Lee · October 6, 2023, 9:46pm

Hello all,

We would like to add a macro, to the Swift standard library, that improves debugging in the following ways:

Debugger users will be able to see debug descriptions in circumstances they previously could not, such as Xcode’s variable view and crashlogs
Code authors can implement, in Swift, how the debugger summarizes their data type – via compatible debugDescription or description implementations

To demonstrate, consider an example:

struct Student: CustomDebugStringConvertible {
  var name: String
  var id: Int
  // more properties
  
  var debugDescription: String {
    "\(id): \(name)"
  }
}

To display this debug description, LLDB needs to perform expression evaluation. LLDB does expression evaluation on demand, commonly using the po command. Outside of explicit commands, expression evaluation is generally not performed, and in some cases not even possible.

Expression evaluation can be avoided in this case, by defining an LLDB Type Summary. The following manually constructed command creates a type summary using LLDB’s summary string format:

(lldb) type summary add -s '${var.id}: ${var.name}' PupilKit.Student

In this case, and many others, the Swift source of debugDescription can be converted to an LLDB type summary. Any debugDescription implementation that references only stored properties, should in theory be convertible.

We propose a macro which converts compatible implementations of debugDescription or description into LLDB summary strings, and will embed that string into the binary. LLDB will load these records automatically. An additional benefit of this reuse is authors can write unit test for their debug descriptions, to catch regressions.

In the above example, the change is the minor addition of the macro:

@DebugDescription
struct Student: CustomDebugStringConvertible {
  // same as before
}

The @DebugDescription macro generates global constants (via the peer role) that contains the type name, and the converted summary string. For demonstration only, the expanded macro might look like:

@_section("__DATA_CONST,__lldbsummaries")
let Student_lldb_summary = ("PupilKit.Student", "${var.id}: ${var.name}")

The reason this is demonstration only is that @_section globals do not support String values. Instead, the values emitted will be tuples of UInt8 (in UTF-8 encoding). Additionally, the implementation would support other platform specific section names (the above is Darwin specific).

When attached to an incompatible implementation of debugDescription or description, the macro will emit a warning or error. An incompatible implementation is one that requires expression evaluation, this includes function calls, initializers, arithmetic and other operators, casting, etc. Anything outside of property reads can be assumed to require expression evaluation. The initial version will support string literals, as they map to directly to summary strings. To support a wider range of implementations, such as those that construct strings using conditionals or loops (but still not function calls) the macro would instead generate an LLDB script. This functionality would be a future improvement.

Unfortunately, there are cases where the macro would be not be able to identify an incompatible implementation. A computed property, unlike a stored property, requires expression evaluation. At the AST level, some references to computed properties can appear indistinguishable from a reference to a stored property. This is an inherent constraint of macros and the scope of the AST that’s made available to them. When the macro is unable to identify use of a computed property at compile time, LLDB will emit a warning at debug time.

Thank you for your feedback and ideas!

davidbalbert · December 1, 2023, 3:54pm

Having just spent a couple of hours fighting with LLDB's Python interface to generate a type summary with some conditional logic, I think this would be fantastic.

The two most vexing issues I've run into with LLDB type summaries and Swift are:

Getting the type names correct so that the summaries actually get used, especially with generics, which IIRC can require regular expressions in type summary add.
Any sort of conditional logic, which by definition requires using LLDB's Python API.

For the former, it would be a big win to just not have to figure out what the type names are. Even better, it would be great to be able to generate debug descriptions for a generic type with the ability to provide an override for a specific specialization, e.g.

@DebugDescription
struct MyContainer<T>: CustomStringConvertible {
    var description: String {
        "generic"
    }
}

extension MyContainer where T == Int {
    var description: String {
        "specialized with int"
    }
}

I realize you're explicitly flagging conditionals and loops as future work and not part of this proposal, and I totally understand why. That said, I think @DebugDescription will get way more useful once they're included.

As an aside, for anyone who is trying to integrate an LLDB Python script with Xcode's project specific LLDBInitFile using relative paths (so that anyone who clones your repo will automatically get the Python script loaded), the flag you're looking for is "-c". Specifically command script import -c myscript.py. The "-c" flag will make LLDB use paths relative to the current lldbinit file being sourced. The only place I found this documented was in this commit.

Karl · December 2, 2023, 5:33pm

I wonder if this makes sense for the standard library, or whether it should perhaps go in an LLDB-specific support library.

Would other debuggers support these descriptions?

By explicitly scoping this feature to be LLDB-specific, could we offer better integration between LLDB and Swift code, without having to worry about other debuggers?

davidbalbert · December 2, 2023, 9:53pm

I'm not qualified to speak on Karl's question about whether this belongs in the standard library, and don't have strong opinions either way. I'd be excited to see this shipped in whatever form works best for Swift.

Leaving that to the side, here are two other real-world issues I’ve run into with type summaries:

Custom type summaries with Ranges

Consider a situation where you have a custom collection with an associated index:

struct MyString {
    struct Index {}
}

You make a type summary for MyString.Index (let's say "i0", "i1", "i2"), and would like Ranges of indices to print using your type summary (e.g. "i3..<i11")

The regex for for the Range type summary ^Swift.Range<.+>$, which does match Range<MyModule.MyString.Index>, but the C++ implementation that generates the summary has logic for deciding whether or not to actually generate one. Without knowing the internals of LLDB, it’s not clear what you have to do to get a Range<MyModule.MyString.Index> to print using MyString.Index's type summary.

This even shows up in the standard library: Range<Int> has a summary but Range<String.Index> doesn't:

let s = "hello"
let a = 5..<10
let b = s.startIndex..<s.endIndex

(lldb) p a
(Range<Int>) 5..<10
(lldb) p b
(Range<String.Index>) {
  lowerBound = 0[any]
  upperBound = 5[utf8]
}

A workaround is to define your own type summary that matches instead, but it’s somewhat of a pain to get right, and IIRC putting the summary in a category other than "default," might affect whether yours gets picked.

type summary add --summary-string "${var.lowerBound}..<${var.upperBound}" "Range<MyModule.MyString.Index>"

Type aliases

The Range issue gets more complicated when you have type aliases. Consider this expanded example:

struct MyString {
    struct Index {}
    struct UTF8View {}
}

extension MyString.UTF8View {
    typealias Index = MyString.Index
}

struct MySubstring {
    typealias Index = MyString.Index
}

The good news is that a type summary for MyModule.MyString.Index matches even if you’re looking at a value with a static type of MyModule.MySubstring.Index or MyModule.MyString.UTF8View.Index, but the Range issue compounds. You need to have custom summaries for Range<MyModule.MyString.Index>, Range<MyModule.MyString.UTF8View.Index> and Range<MyModule.MySubstring.Index>.

You can use one summary for the first two by switching the type name to the regex ^Range<MyModule.MyString.+Index>$ (though given how dots are used in these examples, you might have to stare at it for a second to see why it matches both), but the third has to be matched separately – a simple non-regex matcher will do: "Range<MyModule.MySubstring.Index>"

The good news is that all of these are solvable problems, but they’re annoying, especially if you’re trying for the first time – even more pedestrian things like realizing that the Swift module is special and you don't need to qualify types with "Swift.", but you do need to qualify your own types with "MyModule." can slow things down, and the feedback loop isn't particularly fast. It would be great if everything just worked.

Adrian_Prantl · December 4, 2023, 5:36pm

I wonder if this makes sense for the standard library, or whether it should perhaps go in an LLDB-specific support library.

Would other debuggers support these descriptions?

By explicitly scoping this feature to be LLDB-specific, could we offer better integration between LLDB and Swift code, without having to worry about other debuggers?

LLDB is the only debugger that has a Swift plugin at this point and it's part of the swift.org tool chain. Since it's the official Swift debugger I think it's appropriate to support it as a first-class citizen in the standard library.
That said, in a future where other debuggers exists, we can easily swap out the implementation of the macro with a debugger-specific plugin by adjusting plugin search paths in the compiler, while still having all plugins implement the same interface declared in the standard library.

Karl · December 7, 2023, 4:01pm

That leads me to a broader question: why should this be a macro at all?

Why should I have to run a plugin to generate metadata for the debugger? Why should I have to write in my code "oh, this particular type should have good summary strings in the debugger"?

The process of generating debug metadata is already handled by the compiler without requiring source annotations; so why can't it automatically perform the same processing this macro does to produce better debug metadata?

AFAICT, the only reason for annotating this in source code would be:

If the compiler already performs this processing where it can (but doesn't error where it can't), then I can use the macro to ensure my code only builds if debugDescription can be converted in to something supported by a particular debugger without expression evaluation. That's a significantly less useful feature, but I can imagine it still having value for certain use-cases.

However, it would be a high portability risk: the developer is explicitly saying "this summary MUST be available without expression evaluation", which means it is inherently locked to the capabilities of a particular debugger and plugin implementation.

I'm not convinced that is desirable. LLDB is a tool used to debug Swift programs; and Swift code should not fail to build because somebody wishes to use a different tool.

But I have nothing against the tool offering libraries which allow developers to integrate better with it.

Jon_Shier · December 7, 2023, 4:20pm

I agree, and it seems inappropriate (and rather suboptimal) to require a macro just to get the debugger to work well. All of this information is already available, as can be seen in the swift-custom-dump framework, which defines automatic dump descriptions that integrate into the rest of the tools, including automatic diffing. This rather proves that the debugger could do a much better job of providing usable descriptions automatically.

Adrian_Prantl · December 7, 2023, 4:34pm

The swift-custom-dump tool linked above uses Swift reflection metadata. If you are happy with the level of output from that you don't need this macro at all. LLDB also knows how to read reflection metadata and if you use p (frame variable) on any Swift object it will format it to the same level of fidelity that swift-custom-dump has access to. This macro is for use-cases where you want to go beyond that and customize a formatter, either by leaving out or summarizing implementation details, i.e., anything that goes beyond just recursively formatting all fields.

Jon_Shier · December 7, 2023, 5:02pm

I think I understand now, the main point is to avoid expression evaluation altogether. Any customization (which is already possible) is simply a byproduct. I'm still concerned that while, as a standard library macro, it doesn't have to pay the swift-syntax cost, it still suffers from the macro scalability issues at build time, as well as macro composability limitations. It really seems like this should be a general solution for the language so all debuggers can benefit from static data.

wes1 · December 7, 2023, 7:57pm

As I understand it, you're enabling developers to write debug descriptions using LLDB summary string format, so results in LLDB will be as if the type summary was defined. This is great!

(I assume someone can write a builder that developers can use to emit well-formed type summaries, and integrate it with Xcode.)

What about result when called from source code? Like property wrappers, this seems to have a wrapped raw value and an evaluated shadow wrapper. The source user mostly wants the evaluated result, but other clients might want the type summary itself.

I can also see debug users wanting to switch styles, for interactive po vs stack traces; would this be compatible with a later evolution where there were n>1 type summaries?

I understand the question now is just this limited feature, but thinking about other uses might sway the current implementation away from just overloading debugDescription. If you used another var, you could require it be StaticString or some type that only accepted valid LLDB type summaries.

As another client/use, type summaries could really help for the critical feature of deferring rendering in the performance-sensitive context of tracing. The message would contain the object and the type summary specification, and would render it only on demand -- via LLDB-compatible runtime interpretation.

Also there are other meta-programming users now stuck with reflection or code generation as implementation alternatives, and this could be a helpful middle ground. They would write different specifications than expected for debugging.

If this were modeled as protocols, then it would be much easier to build integrations, apply conformances, etc.

That direction also has implications for the stdlib. More clients might suggest putting it in the stdlib, but a separate implementation package would better support consumption, evolution and backporting. (It can never come out once it's in the stdlib, which also ties LLDB.)

So it seems like for integrations, a separate LLDBConfig protocol is warranted, which could offer selective conformance to CustomDebugStringConvertible but also be used otherwise (and users could have both).

It's possible that this implementation is fixed and well-defined enough to be hardwired into the stdlib, and some external package could build on it, or just use the spec and some infra separately.


struct LLDBTypeSpec {
  let lldbTypeSpec: ...
  // some validation via  init parms, failable
}

protocol LLDBConfig {
  static let lldbTypeSpec: LLDBTypeSpec
}

extension PupilKit.Parent: LLDBConfig {
  // user-written
  static let spec: LLDBTypeSpec = "..."
}

extension PupilKit.Parent: LLDBConfig {
  // macro builder?
  static let spec: LLDBTypeSpec = #LTS(.public, .colon, .inline)
}

extension PupilKit.Student: LLDBConfig {
  // compiler-synthesized?
}

// declare once per type?
@LLDBConfigured
struct Student { ... }

lorentey · December 8, 2023, 1:53am

In addition to the macro, can we please also expose a straightforward way to simply manually specify the summary string that we want lldb to use, alongside each type?

@_lldbFormatter("${var.id}: ${var.name}")
struct Student: CustomStringConvertible {
  var name: String
  var id: Int

  var description: String {
    // potentially more complicated code than what `@DebugDescription`
    // would be able to handle
  }
}

I understand macros are very awesome, but it isn't always appropriate to rely on heuristic behavior. For instance, I think it would not always be possible to simplify description implementations only to appease a macro.

Dave_Lee · December 8, 2023, 2:41am

For this reason, we are planning to support an additional property, maybe named _debugDescription. This property will be preferred over debugDescription when generating an LLDB type summary. This way, if there are reasons not to change debugDescription or description, your type can still provide a type summary. Do you think this will satisfy your needs?

lorentey · December 8, 2023, 8:29pm

That would allow us not to water down description/debugDescription; however, I wonder if the limitations of what the macro can understand will end up frustrating work.

For instance, Swift isn't great at handling bit fields, and types that need to manually implement them would be obvious candidates for custom-made lldb summaries -- trying to debug code that deals with such types is always painful.

For example, would it be at all possible to implement the summaries we made for String.Index in this macro setup? They heavily rely on computed properties and integer expressions.

I expect having to manually construct a Python expression to emulate these would be far less frustrating than having to try to fit things into whatever ad-hoc, ill-designed, and undocumented subset of Swift the macro will end up supporting.

extension String {
  /// A position of a character or code unit in a string.
  @frozen
  public struct Index: Sendable {
    @usableFromInline
    internal var _rawBits: UInt64
    ...
  }
}

extension String.Index {
  @_alwaysEmitIntoClient
  internal var _encodingDescription: String {
    switch (_rawBits & Self.__utf8Bit != 0, _rawBits & Self.__utf16Bit != 0) {
    case (false, false): return "unknown"
    case (true, false): return "utf8"
    case (false, true): return "utf16"
    case (true, true): return "any"
    }
  }

  /// A textual representation of this instance.
  @_alwaysEmitIntoClient
  @inline(never)
  public var _description: String {
    // 23[utf8]+1
    var d = "\(_encodedOffset)[\(_encodingDescription)]"
    if transcodedOffset != 0 {
      d += "+\(transcodedOffset)"
    }
    return d
  }
}