SE-0262: Demangle Function

Joe_Groff · July 22, 2019, 4:21pm

The review of SE-0262: Demangle Function begins now and runs through July 29, 2019.

The proposal is written by @Alejandro.

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to me as the review manager via email or direct message on the forums. If you send me email, please put "SE-0262" somewhere in the subject line.

What goes into a review of a proposal?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift.

When reviewing a proposal, here are some questions to consider:

What is your evaluation of the proposal?
Is the problem being addressed significant enough to warrant a change to Swift?
Does this proposal fit well with the feel and direction of Swift?
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Thank you for contributing to Swift!

Joe Groff
Review Manager

jrose · July 22, 2019, 4:31pm

I would have appreciated some use cases included in the proposal. When does one need to demangle a mangled name? What are the benefits of doing so in-process? (I believe the answer is "backtrace reporting, right before you're about to abort", but I'm not sure I find that compelling. Demangling can be lossy and I'd rather have the real symbols.)

On that note, which demangling options are used here? The runtime entry point currently doesn't support any flags, but there are a ton of options defined in include/swift/Demangling/Demangle.h, and it's not obvious to me that the set used for…uh, whatever we currently use the runtime entry point for…is the same set that makes sense to expose to users by default.

API nitpicks: those buffers should either be CChar or UInt8 (for UTF-8, not that mangled names are UTF-8), not Int8.

What was the motivation behind adding the third overload? String.withUTF8 doesn't seem so complicated for someone already in UnsafeBufferPointer land. (Sorry, I didn't follow the pitch thread too closely.)

beccadax · July 22, 2019, 6:01pm

I think we should provide this functionality somehow, but I'm uncomfortable with a few aspects of the design.

"Demangle" is a weird word, but it is a single-word name being given to a very esoteric Swift-internal function in the standard library. Although it's definitely not as bad as Perl 5 giving dump() to a function that causes the process to dump core, I'm uncomfortable with the similarities they do have. I think I would be happier with this if it had a more complicated name that made its uses more clear, like demangle(swiftSymbolName:).

For inputs, we've talked in the past about having an UnsafeString type. Once we have that, would we wish we had designed this with only UnsafeString as an input, instead of having String and UnsafeBufferPointer<Int8> variants?

For outputs, I wonder if a TextOutputStreamable variant might be helpful; after all, we imagine logging these things to be a major use case.

Given the matrix of inputs and outputs, it might make sense to actually think of this as a type instead of a function:

public struct SwiftSymbol {
    // These initializers try to demangle with an empty buffer so they can check validity
    // and fill in `requiredCapacity`.
    public init?(mangledName: String)
    public init?(mangledName: UnsafeBufferPointer<Int8>)   // or UnsafeString?

    // This is set by the initializers when they do their validity
    // checks.
    public let requiredCapacity: Int
}

extension SwiftSymbol {
    // This no longer needs an error return--it just fits what it
    // can into the buffer; you can presize the buffer with
    // `requiredCapacity` if you want to be sure the whole thing
    // will fit.
    public func demangle(into buffer: UnsafeBufferPointer<Int8>)
}

extension SwiftSymbol: TextOutputStreamable {
    public func write<Stream: TextOutputStream>(to stream: inout Stream)
}

extension SwiftSymbol: CustomStringConvertible {
    // This is how you'd get a plain string out of the demangler.
    public var description: String { get }
}

This design should be easier to use and is namespaced better, but it does force two runtime calls, which isn't the best.

In short…I dunno, this doesn't feel right yet. I'm tempted to say we should add the runtime entry point but not expose it in the language until we have more comprehensive reflection facilities, but that's really just punting.

One non-wishy-washy bit of feedback: The overloads that use UnsafeBufferPointer as either inputs or outputs need to document their behavior very specifically. For example, is the output buffer null-terminated? Is it null-terminated even when truncated? This needs to be laid out in the documentation comments, not just the proposal text.

Alejandro · July 22, 2019, 7:42pm

For logging purposes mostly, like Brent says. Besides backtraces, another use case in the future where Swift might have something similar to C++'s std::type_info where retrieving the name of a type returns a mangled name in both clang and gcc (msvc, I believe, returns a demangled name), but that's only a potential use case where backtraces are the main driving use case for this proposal right now.

The options used are the default options with DisplayDebuggerGeneratedModule = false;. I'm not sure how swift_demangle is used internally, but it could make sense to expose some of these options as flags.

This matches the element type from String.utf8CString, so CChar makes sense to me. String.withUTF8's element type is UInt8 so that could also make sense, but the current implementation requires a null terminating input, which withUTF8 does not provide. I'll speak more about input and output null termination in my reply to Brent.

Initially, it was just the string parameter version, but Joe mentioned that taking a buffer input version could make sense so that you don't have to go through allocating a string.

AlexanderM · July 22, 2019, 7:59pm

If this were implemented to return a String, I imagine that people will start writing parsers to fish out the important bits that they want. I think this should definitely return a struct instead.

Ideally, this could set the ground-work for reflection in the future, by implementing the types necessary to describe reflected Swift entities. Here's a rough start:

protocol SwiftMember {
    let accessLevel: SwiftMemberAccessLevel
    let name: String
}

struct SwiftProperty: SwiftMember {
    let accessLevel: SwiftMemberAccessLevel
    let name: String
    let type: AnyClass

    let isSettable: Bool
    // add info on didSet/willSet, etc.
}

struct SwiftFunctionParameter {
    let label: String
    let name: String
    let type: AnyClass
}

struct SwiftMemberAccessLevel {
    case .private, .filePrivate, .internal, .public, .open
}

struct SwiftFunction: SwiftMember {
    let name: String
    let returnType: AnyClass
    let arguments: [SwiftFunctionParameter]

    let isFinal: Bool
    let isMutating: Bool
    let throws: Bool
    let accessLevel: SwiftMemberAccessLevel
}

public struct SwiftSymbol {
    let moduleName: String
    let typeName: String? // Could be non-optional, if we reserve something like "Global" for globals
    let member: SwiftMember
}

extension SwiftSymbol {
    // These initializers try to demangle with an empty buffer so they can check validity
    // and fill in `requiredCapacity`.
    public init?(demangleFrom mangledName: String)
    public init?(demangleFrom mangledName: UnsafeBufferPointer<Int8>)
}

extension SwiftSymbol {
    // This no longer needs an error return--it just fits what it
    // can into the buffer; you can presize the buffer with
    // `requiredCapacity` if you want to be sure the whole thing
    // will fit.
    public func demangle(into buffer: UnsafeBufferPointer<Int8>)
}

extension SwiftSymbol: TextOutputStreamable {
    public func write<Stream: TextOutputStream>(to stream: inout Stream)
}

extension SwiftSymbol: CustomStringConvertible {
    // This is how you'd get a plain string out of the demangler.
    public var description: String { get }
}

Alejandro · July 22, 2019, 8:07pm

This is a really interesting angle. I agree this could be even more useful once we have better reflection facilities. I wonder if maybe this could tie into my comment earlier about a potential use case with C++'s std::type_info incorporated in Swift somehow and some name property returns a SwiftSymbol that can be demangled.

The output is always null-terminated, even when truncated.

We had some discussion on the implementation about whether the buffer input version should be null-terminated, or whether it shouldn't and currently its implemented that it should be null terminated. This is important because it determines whether you use withUTF8 or utf8CString.withUnsafeBufferPointer for string if we decide to remove the string overload. (Although arguably you could use the last one in both cases).

let buffer: UnsafeMutableBufferPointer<CChar> = ...

// If input buffer is not null terminated
// Element type here is UInt8
"$sSi".withUTF8 {
  // If demangle takes a CChar buffer
  // This isn't strictly required if the platform's CChar is UInt8
  $0.withMemoryRebound(to: CChar.self) {
    demangle($0, into: buffer)
  }
}

// If input buffer is null terminated
// Element type here is CChar
"$sSi".utf8CString.withUnsafeBufferPointer {
  // If demangle takes a CChar buffer
  demangle($0, into: buffer)
}

Jean-Daniel · July 22, 2019, 9:00pm

Parsing the demangled name is not easier than parsing the mangled name in the first place. Why would one start to demangle and then parse the result ?

There is probably a need for a demangler that produce an AST, but I think it does not remove the need for a simpler version.

sharplet · July 22, 2019, 9:20pm

How is one expected to use the required capacity variant of DemangleResult? AFAIK there isn’t any realloc functionality provided by the standard library. Is the intention that one would need to rely on some higher level collection’s reserveCapacity() method?

In any case, it would be helpful if the example in the proposal actually demonstrated what kind of code one would need to write in order to make use of that result, rather than simply printing a message to stdout.

CTMacUser · July 23, 2019, 1:23am

For the versions that output to a buffer, the documentation should specify that the result will be NUL-terminated, a NUL byte won't be used as part of the returned name (besides the terminator), and space for the NUL value needs to be taken into account for the buffer's size. The version that takes a buffer input needs to specify the same restrictions.

Is an inline de-mangling call only for the current program's environment (compiler precise version and platform)? I'm guessing yes. Is the mangled format supposed to be locked for all time? I'm guessing no. When can it change? Can the format change between versions of the compiler? Only between major versions? Do competing compilers for the same platform need to use the same format? Can it differ between platforms?

Evaluation: +1, modulo my queries above
Warrants a change: +0.9
Does it fit: +1
Comparison: N/A, I've never used similar features.
Effort: quick reading

Alejandro · July 23, 2019, 5:40pm

I believe calls to these functions uses the Swift runtime version currently being executed for the given process.

The mangling format is an evolving format, but because of ABI stability, mangling is stable. cc: @Joe_Groff

Competing compilers (although there are none right now to my knowledge) don't have to follow Swift's mangling scheme. The current compiler uses the same mangling scheme for all platforms.

Karl · July 23, 2019, 7:28pm

There is one. Can’t remember the name but I’ve seen them talk about it and ask questions on these forums.

Joe_Groff · July 23, 2019, 8:23pm

Yes, manglings for existing concepts can't change. Other Swift compilers that want to be link-compatible with our compiler would have to use the same mangling for things they want to link against from our binaries, or expose as API for our compiler to link against. The mangling can be extended with new productions for new concepts.

JetForMe · July 25, 2019, 9:24am

Shouldn't there be a corresponding mangle() function?

nuclearace · July 25, 2019, 10:02am

I would say possibly, but as another proposal. Creating an API for mangling might be a bit more tricky than just demangling. However that leads me to my feedback:

Would it be reasonable to nest this under a Mangling enum sort of like MemoryLayout? I think it would be important to not pollute the global namespace with free functions. Especially when we have precedent in nesting these kinds of functions under a type that has a name that makes sense for the operations it'll provide. This would also provide a home for possible future mangling functions to reside in.

I would also suggest moving DemangleResult to also be nested under this new enum but I'd vote against renaming it, since I imagine a possible future MangleResult

Jean-Daniel · July 25, 2019, 12:02pm

Demangling to a string is not an bijective function. This is usually a lossy operation. You can't recreate the mangled string from a demangled symbol.

For instance: both of this mangled string are demangled to the same string:

_$s6mangle3fooyyAA4TypeCF
_$s6mangle3fooyyAA4TypeVF

mangle.foo(mangle.Type) -> ()

So you can't design a symmetric function for mangling.

If you want to be able to work both way, you need a rich API that produce a structured representation of the symbol and not a simple string, but this is out of scope for a simple demangler as proposed here.

Alejandro · July 25, 2019, 10:16pm

This is really up to whoever is using this function. The stdlib might not provide a realloc function, but I know that Swift NIO's ByteBuffer uses malloc, free, and realloc from libc, so it's a possibility others could do the same and use realloc to fit the entire demangled string. I remember there were some discussion about adding a reallocation method to the stdlib, but that was only exposed ManagedBuffer.

JetForMe · July 26, 2019, 6:19am

Well, this seems less than desirable, but obviously it's possible to mangle a function name, or we wouldn't be talking about this.

Since I don't read mangle, I don't know what those two manglings mean, or how they differ. So it's hard to understand why it's lossy (obviously it loses local parameter names).

jayton · July 26, 2019, 8:23am

The difference is that in the first mangling, mangle.Type is a class, and in the second, it’s a struct. Choosing the correct mangling requires type metadata that isn’t preserved in the demangled string, since it isn’t part of the declaration.

ktoso · July 29, 2019, 6:21am

I'm very happy to see review of this function and hope we can figure out a good way forward for it

Agreed on adding more context; I hopefully can provide some more use-cases here and maybe if @Alejandro wanted to those could be included as rationale for this proposal. All of this is written from the perspective of Server Side Swift development where we and the SSWG are spending our efforts around improving the backtrace experience.

What follows is hopefully setting up more use-cases where programatic access to demangling in Swift itself is useful (more about the specific proposal follows below the next "---" ):

Yes, the prime use case is logging human readable backtraces; Although this does not necessarily mean it'd be done in-process, (it could be, and we see use-cases where that would be beneficial, but is not the only reason to expose this function). I do realize the demangled output is lossy, but at the same time, for normal day to day development, it seems like the "less scary" thing to work with, esp when trying to get people to write their next backend in Swift, rather than Java, Ruby, Python etc.

I think for the server story it is useful to share the 3 or so situations (personas?) in which this exposed demangle function in pure Swift would help build a better and more consistent user experience:

development time: "developing my app, debug mode", working on macOS
- Goal: while developing and hitting a fault (also on linux!), like maybe an accidental force unwrap, array out of index or so, I'd want to quickly spot this quickly;
  - note that we care also about developing on linux, or at least the app running on linux (maybe in a docker container), even if I'm developing "on" a mac;
  - it is nice to not have to grab crashes, copy paste them to a magic script (symbolicate-linux-fatal) but have the crash right there in my face when I caused it; Also: not all developers even know about symbolicate-linux-fatal, and it has to be run on linux, so I'd have to ssh into my docker and run it there etc... it becomes a hassle.
- How: symbolicate and demangle in-process, when logging the crash; so the Swift built-in demangle function is
  - also on linux: addresses -> mangled names -> demangled names
continious integration: running a Linux-based CI, and wanting to make sure contributors (e.g. in an open source project, which many of the SSWG projects are) have a simple time spotting mistakes if CI failed
- Goal: Have contributors get quick feedback about what their pull request broke; They may not be developing on linux, but maybe they broke something on linux... making it simpler for them to realize what happened, rather than showing a backtrace that is only addresses and/or mangled names.
- How: We could build some libraries that help making such reporting easier; it would be nice if we wrote them in Swift, rather than hacked together bash and python scripts. We are aiming to get the same "backtrace experience" across platforms and use cases, so it makes sense to use Swift as the language in which we do those symbolicating / demangling.
production: "running in production (server app)" + "something crashed"
- Goals: may depend on what the application is, but a bit higher effort is acceptable here;
  - we would want to avoid: having to manually copy paste logs; running in production with hand rolled solutions to capturing faults and demangling them;
  - note: one usually does not have access to production machines, and can not "ssh into the box" and run some diagnostics there; not always is it possible to get python on there if one wanted to use symbolicate_linux_fatal
  - clear steps and/or tools/libraries that do the right thing for swift-on-server users, like in the ongoing Crash backtraces - #9 by Alejandro effort, which currently uses a nasty hack to call into the stdlib's demangleImpl.
    - We want to avoid users be confused and/or give up when they start trying server-side Swift and in order to do that, the when "things go bad" experience must be a good one. (e.g. dead thread: Using symbolicate-linux-fatal)
- How:
  - maybe in-process: still could be an option for "soft faults" if we'd get them some day (separate discussion), i.e. faults which did not corrupt memory etc. - off-process: e.g. setting up a "watcher" process that catches and symbolicates names can be seen as viable; Examples of such tools are Google's Breakpad How To Add Breakpad To Your Linux Application and the family of tools dealing with minidumps.
    - we could perhaps build tools which help do the right thing, and as part of building your server app you'd also build a "guardian process" which spawns your app, and if it dies, performs the symbolication and logs wherever you wanted it to log.
- How a Swift solution could improve status quo: we know that currently some people are forced to run apps by piping all output through grep for "Fatal error" and the python symbolication script this complicates deployment (ensure python versions, make sure you downloaded the right magic symbolicate python script etc), and this is not something we anticipate the vast majority of people to do; rather, they'll run without anything, and if the system crashes they'd have no output at all. Perhaps we could pull off a simple Swift wrapper app that would monitor the actual app (child) process and take care of this, rather than the python and bash scripts.

For all those 3 scenarios, we think we can do much better for the server ecosystem, by providing tools / libraries or guidelines; though for the tools and libraries it'd be really good to have official APIs to call into.

Demangling specifically, would help with the ongoing effort over here: GitHub - swift-server/swift-backtrace: 💥 Backtraces for Swift on Linux and Windows for improving the backtrace situation on a library level for the time being, though it currently has to call the private demangle API: https://github.com/ianpartridge/swift-backtrace/blob/master/Sources/Backtrace/Demangle.swift#L21 (and I've seen at least one or two more impls which do this); Another alternative is reimplementing it. like GitHub - mattgallagher/CwlDemangle: An implementation of Swift mangled symbol parsing and demangled printing in Swift. does though this again is a pretty bad idea as it can/has/will deviate from the "real" impl over time (I tested it as a sanity check and was getting faults / fatal errors on some backtraces, so yeah -- reimplementing is not a viable option.

On the specific proposal though:

I don't think a mangle function is needed as part of this proposal; we really are focused on improving the backtraces experience, and this would be already a step in the right direction.
I would definitely want to keep a version of the API that allows writing into an existing buffer (be it fixed sized, or realloced to make space for a bigger trace).
no strong opinion on null termination as long as is clearly documented - seems like nul terminating would be nice since they can be wrapped by a String directly then right?

What is your evaluation of the proposal?
- +1, this will help server-side swift a lot and make our lifes when building tools much simpler.
Is the problem being addressed significant enough to warrant a change to Swift?
- +1; Yes; This helps in improving the overall backtrace situation for Swift and will make it easier to build symbolication+demangling in pure Swift, rather than relying on python scripts (making deployments have another moving piece to worry about) and/or manually copying and demangling every crash manually (which is slow and annoying);
Does this proposal fit well with the feel and direction of Swift?
- +1, I think so; Swift should provide the tools to build crash logging infrastructure as part of it, rather than having to rely on python scripts etc. This is an important bit of it, and it's good to have it as part of Swift's library.
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
- While not having used Rust "in anger", myself and @drexin have spent considerable time looking at Swift / Go / Rust's backtrace status quo, from that a few notes about the Rust situation which feels close and like one we could strive for:
  - Rust by default suppresses backtraces on panics, but they are very easy to enable by setting RUST_BACKTRACE=1;
  - if debug info is available, backtraces are symbolicated by default. It works the same on every platform -- macOS as well as Linux, making for a good developer experience.
  - Rust itself does not provide a high level lib for backtraces/demangling however it includes minimal backtrace support in the runtime, and that is then complemented by libraries like backtrace: backtrace::SymbolName - Rust -- which does the "expose as nice API" part that was mentioned here, but this is not part of rust itself really. Only a library.
  - When a rust program panics with debug info available and the traces enabled, the names are nicely demangled in the output:

thread 'main' panicked at 'FOOOOO', src/main.rs:44:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at src/libstd/sys_common/backtrace.rs:59
             at src/libstd/panicking.rs:211
   3: std::panicking::default_hook
             at src/libstd/panicking.rs:227
   4: <std::panicking::begin_panic::PanicPayload<A> as core::panic::BoxMeUp>::get
             at src/libstd/panicking.rs:491
   5: std::panicking::begin_panic
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b/src/libstd/panicking.rs:425
   6: crashes::test5
             at src/main.rs:44
   7: crashes::test4
             at src/main.rs:39
   8: crashes::test5
             at c_src/test.c:29  << even trough C calls
...

- Scala: I have experience with dealing with Scala's name mangling, although it is very verbose and ready to read to be honest, so the demangling topic was never a topic that was brought up. JVM stacktraces also always include linenumbers, so even in presence of a slightly weird name, they were perfectly understandable and readable by non-experts (mostly showing up as "$anon1" or similar things for closures).

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?
- Thorough exploration of the topic as well as thinking about how it can be used to build better diagnostics (logging mostly) in the future. Learnt and tried out in practice what similar languages like Go and Rust do.

Alejandro · July 30, 2019, 9:36pm

Yes, this makes a lot of sense, but one can imagine that there is a lot of design space here too. We could use Brent's idea and reserve this for some SwiftSymbol, or maybe we just want to encapsulate this in something specific for any and all things mangling like you mention.