[Pitch] Expose demangle function in Runtime module

Hello everyone,

After a very successful serverside conference last week, and a bunch of discussions with various members of the community, I came up with a few small improvements that I’d like to pitch.

First of which is the long overdue exposing a simple demangle function. This was first discussed in 2019 (!), and followed up on in 2024 however neither attempts went all the way through evolution; so here’s another attempt at it.


You can view the complete pitch here: SE-NNNN: Expose demangle function in Runtime module. (Pull request)


Expose demangle function in Runtime module

Introduction

Swift symbols are subject to name mangling. These mangled names then show up in backtraces and other profiling tools. Mangled names may look something like this $sSS7cStringSSSPys4Int8VG_tcfC and often end up visible to developers, unless they are demangled before displaying.

In manu situations, it is much preferable to demangle the identifiers before displaying them. For example, the previously shown identifier would can be demangled as Swift.String.init(cString: Swift.UnsafePointer<Swift.Int8>) -> Swift.String, which is a nice human-readable format, that a Swift developer can easily understand.

This proposal introduces a new API that allows calling out to the Swift runtime's demangler, without leaving the process.

Motivation

Currently, many tools that need to display symbol names to developers are forced to create a process and execute the swift-demangle tool, or use unofficial runtime APIs to invoke the runtime's demangler.

Neither of these approaches are satisfactionary, because either we are paying a high cost for creating processes, or we're relying on unofficial APIs.

This proposal introduces an official demangle(:String) -> String? function that offers a maintained and safe way to call the Swift demangled from a running Swift application.

Proposed solution

We propose to introduce two demangle functions in the Runtime module:

A simple demangle method, returning an optional String:

public func demangle(_ mangledName: String) -> String?

And an overload which accepts a pre-allocated buffer into which the demangled string can be written:

@discardableResult
public func demangle(
  _ mangledName: String,
  into buffer: UnsafeMutableBufferPointer<Int8>
) -> DemanglingResult

public enum DemanglingResult: Equatable {
  case success
  case failed
  case truncated(Int)
}

The buffer accepting API is necessary for performance sensitive use-cases, which attempt to demangle symbols in process, before displaying or sending them for further processing. In those use-cases it is common to have a known maximum buffer size into which we are willing to write the demangled representation.

If the demangled representation does not fit the preallocated buffer, the demangle method will return truncated(actualSize) such that developers can determine by how much the buffer might need to be increased to handle the complete demangling.

Demangling format

While the mangled strings are part of Swift ABI and can therefore not really change on platforms with stable ABI, the demangled representation returned by the demangle functions is not guaranteed to be stable in any way.

The demangled representation may change without any warning, during even patch releases of Swift. The returned strings should be treated mostly as nicer to present to developers human readable representations, and it is not a goal to provide any form of guarantee about the exact shape of these.

20 Likes

any reason this isn’t a [Raw|UTF8]Span?

9 Likes

Now that we have a rich set of Span APIs, could this function take a MutableSpan or OutputSpan instead of an UnsafeMutableBufferPointer? That would allow for the preallocation but support more general use cases as well.

Similarly, it Int8 the right element type here? Would UTF8.CodeUnit (i.e., UInt8) be a better fit?


This is important functionality so I support having it exposed in the runtime for important use cases (like in-process backtraces), but I hope that this isn't where we stop. I see about three kinds of systems that do demangling:

  1. In-process demangling
  2. Out-of-process demangling (taking a crash log with symbols from an external source and rendering it)
  3. Semantic analysis tooling that wants to understand the full expanded tree structure of symbols

This pitch satisfies #1, but #2 and #3 would be better served by a demangler as a separate package. For out-of-process demangling or more in-depth demangling like semantic analysis, you really want the flexibility of a separate package instead of being limited to whatever the standard libraries can evolve.

(I've said it before on these forums, but as more of the compiler is migrated to Swift, the demangler would be an excellent leaf target that could be migrated to something that third-parties could easily use, similar to swift-syntax.)

But again, this is still very useful in its own right!

4 Likes

Good reminder about Span, thanks folks – I’ll look into that soon.

Separate package would be nice as well, but probably something to tackle separately :thinking:

AFAICS, the following

public func demangle(
  _ mangledNameSpan: Span<UInt8>,
  into buffer: inout MutableSpan<UInt8>
) -> DemanglingResult

is simple to pull off, and I’m happy to include that.

If someone more familiar with spans would like to confirm if that’s the kind of spans we’d like to have in public API that’d be useful. We also have UTF8Span but it feels like we can be more accepting on the input here, so folks don’t have to validate the input beforehand.

We would keep the String → String one though I think, it’s a convenient one for quick checks.

Do we need any other Span accepting/returning versions though?

1 Like

Should the success case include a length? If not, how are you supposed to determine where the demangled string ends?

Yeah it’s a good question: A successful demangling will be a well formed null terminated c-string, so technically speaking you’d be able to trust the output string was correctly written and terminates at the \0. Converting the span into a string then you’d respect that.

Though from docs I’m not sure if the public init(validating codeUnits: Span<UInt8>) throws(UTF8.EncodingError) actually handles null-terminated spans properly – I’ll need to check that.


Update: So turning the written into mutable span into a string like: String(copying: try! UTF8Span(validating: outputSpan.span)) sadly indeed does not respect the \0 termination so we may need to return the count in the success case after all to make it simpler to consume without having to also scan for the \0 before we create our UTF8Span.

It might be nice to have an c-string style aware initializer to UTF8Span but we don’t have that today. :thinking:

Personally, I prefer the Windows naming for these functions. They are undecoration routines, undoing the name decoration that is language specific, and so undecorate seems appropriate. I feel like this is similar to the "sanity check" vs "soundness" type of approach to naming.

Is the goal here to simply provide a wrapper over libswiftDemangle? Why expose that through the Runtime module rather than add an overlay for libswiftDemangle?

The Runtime module currently does not support Windows (though I believe that @al45tair is working on that. If this were exposed through the Runtime module, would that be split up such that we could use this before the rest of the module?

This would be very useful to us and happy to see this pitched!

With regard to the more complex use cases outlined by @allevato I think serving #2/#3 could be done as future directions - solving for #1 seems like a great first win with little downside. As previous attempts got a bit derailed, it would be super if this could be landed and at least address this use case.

With regard to the naming as pointed out by @compnerd I would just point out the existing prior art of naming for swift demangle- it seems reasonable to keep the nomenclature identical as it unlike the “sanity check” vs “soundness”, demangledoes not use non-inclusive language as far as I can tell.

Using spans for the output seems like the right thing, but I have too little hands on experience with it yet to provide any useful feedback.

One question I have is if there would be a desire to have an additional option to this API on whether one wants long or short output form? The details of what consists long or short form would still be completely up to the implementation and can change of course, but depending on where one would display such human readable information it could be quite useful.

2 Likes

FWIW, #2 is something that's on my TODO list, and using the demangler from the Runtime module would be just fine IMO.

I agree with the other comments about how using Span would be better for the lower-level API.

I think almost everyone who isn't a Windows programmer will search for a demangle function, rather than an undecorate function, so personally, for discoverability reasons alone, I think demangle is a better choice, even if it is slightly inaccurate.

It's worth pondering whether this function should also support demangling C++ names? That way most users would be able to use the nice Swift API from the Runtime module and not have to worry about using either __cxa_demangle or __unDNameEx.

I also wonder whether demangle(_ mangledName: String) -> String? is the right spelling for the high level API. Might String.init?(demangling: String) be a better choice?

Additionally I wonder whether we should have an OptionSet to specify whether we are just interested in the user-visible symbol name, or also in the type information? It's not uncommon to want shorter and longer demanglings, depending on the context. Maybe something like

struct DemanglingOptions: OptionSet {
  let rawValue: Int

  static let nameOnly = DemanglingOptions(rawValue: 1 << 0)
}

as a starting point. Then we'd have

extension String {
  public init?(demangling: String, options: DemanglingOptions = [])
}

@discardableResult
public func demangle(
  _ mangledName: String,
  into buffer: inout MutableSpan<UInt8>,
  options: DemanglingOptions = []
)

Microsoft has quite a list of options for its API, FWIW.

1 Like

That’s great to hear, and glad that using the Runtime module would be ok for that :+1:

I personally don’t like making everything an init like that, discoverability of those is pretty horrendous, so I’d prefer keeping it a method.

Also because it easier shows us all the alternative versions then, you start with the String one, and move on to the Span one etc.

Great minds think alike :smiley: In the implementation I actually ended up exposing an endpoint with options, so we can add them later without any new ABI entry points; The current swift_backtrace_demangle didn’t have options, so I ended up doing new entry point so we can do flags.

I didn’t define any flags option set yet though; perhaps we could leave it to future evolution? Overall agreed though. We do have tons of options we could surface but I don’t know how many of them we are willing to “support” :thinking: The runtime currently has:

struct DemangleOptions {
  bool SynthesizeSugarOnTypes = false;
  bool QualifyEntities = true;
  bool DisplayExtensionContexts = true;
  bool DisplayUnmangledSuffix = true;
  bool DisplayModuleNames = true;
  bool DisplayGenericSpecializations = true;
  bool DisplayProtocolConformances = true;
  bool DisplayWhereClauses = true;
  bool DisplayEntityTypes = true;
  bool DisplayLocalNameContexts = true;
  bool ShortenPartialApply = false;
  bool ShortenThunk = false;
  bool ShortenValueWitness = false;
  bool ShortenArchetype = false;
  bool ShowPrivateDiscriminators = true;
  bool ShowFunctionArgumentTypes = true;
  bool DisplayDebuggerGeneratedModule = true;
  bool DisplayStdlibModule = true;
  bool DisplayObjCModule = true;
  bool PrintForTypeName = false;
  bool ShowAsyncResumePartial = true;
  bool ShowClosureSignature = true;

So we “could” expose them, but again, I don’t know how much we’re willing to keep them working in the future :thinking:


C++ demangling, yeah I don’t see why not.

2 Likes

:-) That's because it was intended specifically for the backtracer, so I thought I knew what options I wanted.

I think we should at least think about options; maybe this could be something we do in the future — after all, we could just add an overload that takes options, rather than having a default option set value.

1 Like

Maybe just an option set with a single option for now with is just “qualify symbols”? That one I think is rather common and I don’t think we’d drop that one.

EDIT: Ignore please, this has been discussed above already, missed on first pass.

What about the various demangle modes?

$ swift demangle --help
USAGE: swift-demangle [options] [mangled name...]

OPTIONS:

Color Options:

  --color                - Use colors in output (default=autodetect)

General options:

  --classify             - Display symbol classification characters
  --compact              - Compact mode (only emit the demangled names)
  --expand               - Expand mode (show node structure of the demangling)
  --no-sugar             - No sugar mode (disable common language idioms such as ? and [] from the output)
  --remangle-new         - Remangle the symbol with new mangling scheme
  --remangle-objc-rt     - Remangle to the ObjC runtime name mangling scheme
  --simplified           - Don't display module names or implicit self types
  --strip-specialization - Remangle the origin of a specialized function
  --test-remangle        - Remangle test mode (show the remangled string)
  --tree-only            - Tree-only mode (do not show the demangled string)
  --type                 - Demangle a runtime type string

I tend to use --compact or --simplified.

1 Like

Thank you for the pitch.

I just tried the unofficial / private "demangle" – it supports both Swift and C++, so yeah would be great if this new one also supports both.

Seems the return type could more ergonomically be an Int for number of bits written, which could trivially be compared to the buffer capacity and/or zero as needed, or if being more explicit about failure is strongly desired, an Int? (nil if failure to demangle). This would avoid the need to introduce a bespoke Result-like type for a single use case. It would also address the point about having to know the number of bits written in the success case:

2 Likes

I think I would expect this to write the output into an OutputSpan<UInt8> rather than a MutableSpan<UInt8>. MutableSpan would require the provided memory to be pre-initialized whereas OutputSpan allows demangle to receive uninitialized memory from a buffer pointer and initialize it with the specified values up to the provided capacity. I think this matches the semantics of this function more than a MutableSpan would. @glessard what do you think?

The OutputSpan at the end of the function would contain the count of initialized elements (separate from the capacity of the buffer) which would be an easy way to determine how long the filled in buffer is and provide that information back to the caller. SE-0485 also proposes a String initializer similar to the unsafe buffer variant that initializes a string via an OutputSpan so that might square well with this use case. What do you think about using an OutputSpan here?

7 Likes

+1

2 Likes

Would it make sense to provide versions that work in terms of CChar so that it works regardless of the signedness of the native char type?

Since Swift symbols can contain Unicode symbols outside the usual 7-bit ASCII range, CChar isn't the right type to represent them (nor would a signed 8-bit integer be). If the API operates on the underlying code units (as the one pitched here does), then UInt8 is the correct type (or even better, its alias UTF8.CodeUnit) and the API should unambiguously document that the result will be valid UTF-8 (unless truncated, I suppose).

4 Likes