Demangle Function

Alejandro · June 7, 2019, 8:36pm

Over on the Crash backtraces thread, there's some discussion using Swift's demangler to help pretty up backtraces into a human readable format. Right now the standard library doesn't include a demangle function, but the Swift runtime already includes such function. We can make use of that and incorporate it into the standard library. A naive API for this can look like:

public func demangle(_ mangledName: String) -> String

however, the demangle function offers a bit more functionality we can take advantage of. For example, instead of returning a new string, we could pass a pre-allocated buffer that it could write to:

public func demangle(
  _ mangledName: String,
  info buffer: UnsafeMutableBufferPointer<Int8>
)

There's an extra flags option that the demangler takes, but right now there currently aren't any flags. If in the future the demangler ever takes optional flags, we could introduce a new overload with a flags parameter. Other ideas for how to accommodate this case would be greatly appreciated.

Would love to hear thoughts and opinions!

nuclearace · June 7, 2019, 8:41pm

I'd be open to including some demangling function in the stdlib. I'm not really sure we'd want to make it any more complicated than a simple (String) -> String function, unless there are clear performance implications around the use of the function. Which may be the case if it's to be heavily used in certain domains.

Alejandro · June 7, 2019, 8:56pm

It might make sense to offer both of these as overloads so that for the most part people can use the simple (String) -> String, but those that need to write into buffers can do so.

Joe_Groff · June 7, 2019, 8:59pm

Moreso than performance, in a crash situation you may want to avoid memory allocation, as @IanPartridge notes:

So having a low-level form that reads and writes out of raw buffers would be useful, alongside the high-level String -> String interface.

Alejandro · June 7, 2019, 8:59pm

Something that is also worth discussing, should we append the Swift 5 mangled prefix in the case that someone doesn't provide it?

// Si = Swift.Int
demangle("Si") // returns Si
demangle("$sSi") // returns Swift.Int

We could've implicitly appended $s to the beginning of the first to provide a successful demangle. Does it also make sense to return the mangled name in the case of failure? Should this return String? instead?

Joe_Groff · June 7, 2019, 9:00pm

The demangler tries to be forgiving with input, so if you have leading _, miss a leading $, use an obsolete _T* prefix or anything like that it still produces results. It seems reasonable to keep that with this interface.

IanPartridge · June 7, 2019, 9:03pm

It looks like the current demangler allocates as it goes, then copies the result into the provided buffer. But that could be improved, I guess.

Joe_Groff · June 7, 2019, 9:07pm

The demangler can take a pre-allocated stack buffer, but yeah, it will attempt to reallocate if it runs out of space. We could maybe add a mode that gives up and give a best effort rendering of what it's done so far instead.

IanPartridge · June 7, 2019, 9:14pm

Doesn't this code always copy?

github.com

apple/swift/blob/1b198ea956a5605221d43f8d3a4fa49f82c86eb1/stdlib/public/runtime/Demangle.cpp#L676


      
              return strdup(result.c_str());
            }
          
          
  // Indicate a failure if the result does not fit and will be truncated
            // and set the required outputBufferSize.
            if (*outputBufferSize < result.length() + 1) {
              *outputBufferSize = result.length() + 1;
            }
          
          
  // Copy into the provided buffer.
            _swift_strlcpy(outputBuffer, result.c_str(), *outputBufferSize);
            return outputBuffer;
          }

Joe_Groff · June 7, 2019, 9:16pm

Ah, it uses the preallocated buffer to build the AST nodes, but maybe not for the final rendering. Yeah, we should fix that.

Alejandro · June 7, 2019, 9:39pm

Maybe I'm reading this wrong, but shouldn't it be written something like:

// Copy into the provided buffer.
_swift_strlcpy(outputBuffer, result.c_str(), *outputBufferSize);

// Indicate a failure if the result did not fit and was truncated
// by setting the required outputBufferSize.
if (*outputBufferSize < result.length() + 1) {
  *outputBufferSize = result.length() + 1;
}

that way you don't write past the given allocated buffer? This also begs the question for better error recovery for the into buffer version if they want to know how much to allocate for the full output.

ktoso · June 7, 2019, 11:52pm

Yeah it would be nice if we could:

when we start a thread allocate a buffer for use for our demangling
and when when crash use that buffer space for the demangler.

This being in addition to a simple "String -> String" API is fine I think.

Same applies for buffer space for where to store the backtrace to begin with, though this we can do all in the potential "better backtraces" library.

IanPartridge · July 1, 2019, 3:34pm

Hi @Alejandro I saw you opened [stdlib] Introduce demangle function by Azoy · Pull Request #25314 · apple/swift · GitHub - awesome stuff.

What's next for this effort? Are you going to open a swift-evolution pull request too?

Alejandro · July 1, 2019, 5:59pm

Sorry, I've been on a little hiatus due to life, but I'll write up a proposal so we can get this going again.

IanPartridge · July 1, 2019, 6:05pm

Great, let me know if you want help.

Alejandro · July 1, 2019, 8:05pm

I wrote up a quick proposal that proposes what is currently implemented, although design discussion, proposal fixes, etc. is always welcome!

Demangle Function

Proposal: SE-NNNN
Author: Alejandro Alonso
Review Manager: TBD
Status: Awaiting review
Implementation: apple/swift#25314

Introduction

Introduce a new standard library function, demangle, that takes a mangled Swift symbol, like $sSS7cStringSSSPys4Int8VG_tcfC, and output the human readable Swift symbol, like Swift.String.init(cString: Swift.UnsafePointer<Swift.Int8>) -> Swift.String.

Swift-evolution thread: Demangle Function

Motivation

Currently in Swift, if a user is given an unreadable mangled symbol, they're most likely to use the swift-demangle tool to get the demangled version. However, this is a little awkward when you want to demangle a symbol in-process in Swift. One could create a new Process from Foundation and set it up to launch a new process within the process to use swift-demangle, but the standard library can do better and easier.

Proposed solution

The standard library will add the following 2 new functions.

// Given a mangled Swift symbol, return the demangled symbol.
public func demangle(_ input: String) -> String?

// Given a mangled Swift symbol and a preallocated buffer,
// write the demangle symbol into the buffer.
@discardableResult
public func demangle(
  _ input: String,
  into buffer: UnsafeMutableBufferPointer<Int8>
) -> Int?

Examples:

print(demangle("$s8Demangle3FooV")!) // Demangle.Foo

// Demangle.Foo is 13 characters + 1 null terminator
let buffer = UnsafeMutableBufferPointer<Int8>.allocate(
  capacity: 14
)
defer { buffer.deallocate() }

demangle("$s8Demangle3BarV", into: buffer)
print(String(cString: buffer.baseAddress!)) // Demangle.Bar

Detailed design

If one were to pass a string that wasn't a valid Swift mangled symbol, like abc123, then the (String) -> String? would simply return nil. With the (String, into: UnsafeMutableBufferPointer<Int8>) -> Int? version, we would return nil indicating success, but wouldn't write the passed string into the buffer.

This proposal includes a trivial (String) -> String? version of the function, as well as a version that takes a buffer. The buffer version is marked @discardableResult because it returns an optional integer indicating whether or not we were able to fully demangle the symbol given the buffer's size. In the case of a successful demangle, this functions returns nil, however in the case that it's not nil, the integer returned is the number of bytes required for the full demangle. We're still able to demangle a truncated version of the symbol, but not the whole symbol if the buffer is smaller than needed (Because this byte sequence could be truncated at any point, there is a possibility of breaking a non-ascii sequence resulting in unknown text. You might be demangling a declaration with in the name, but truncation could break the emoji sequence). E.g.

// Swift.Int requires 10 bytes = 9 characters + 1 null terminator
// Give this 9 to excercise truncation
let buffer = UnsafeMutableBufferPointer<Int8>.allocate(
  capacity: 9
)
defer { buffer.deallocate() }

if let required = demangle("$sSi", into: buffer) {
  print(required) // 10 (this is the amount needed
                  //     for the full Swift.Int)
  let difference = required - buffer.count
  print(difference) // 1 (we only need 1 more byte
                    //    in addition to the 9 we already
                    //    allocated)
}
print(String(cString: buffer.baseAddress!)) // Swift.In

This implementation relies on the Swift runtime function swift_demangle which accepts symbols that start with _T, _T0, $S, and $s.

Source compatibility

These are completely new standard library functions, thus source compatibility is unaffected.

Effect on ABI stability

These are completely new standard library functions, thus ABI compatibility is unaffected.

Effect on API resilience

These are completely new standard library functions, thus API resilience is unaffected.

Alternatives considered

We could choose to only provide one of the proposed functions, but each of these brings unique purposes. The trivial take a string and return a string version is a very simplistic version in cases where maybe you're not worried about allocating new memory, and the buffer version where you don't want to alloc new memory and want to pass in some memory you've already allocated.

Future Directions

The swift_demangle runtime function has an extra flags parameter, but currently it is not being used for anything. In the future if that function ever supports any flags, it would make sense to introduce new overloads or something similar to expose those flags to the standard library as well. E.g.

public func demangle(
  _ input: String,
  flags: DemangleFlags
) -> String?

public func demangle(
  _ input: String,
  into buffer: UnsafeMutableBufferPointer<Int8>,
  flags: DemangleFlags
) -> Int?

where DemangleFlags could be an enum, OptionSet, [DemangleFlag], etc.

Jean-Daniel · July 1, 2019, 9:12pm

I find it unfortunate that you have to choose between convenience (using the simple variant) and proper error handling.

I don't see why the simple variant can't return a String?, so the caller can choose the behaviour.

It's easy enough to write print(demangle(symbol) ?? symbol) if you want to print the input in case of error.

mattrips · July 2, 2019, 7:25am

+1 for (String) -> String?

Alejandro · July 2, 2019, 1:53pm

Thanks, I agree as well. My one concern is that nil means different things for the variants. (String) -> String? nil in this case means a failure, whereas nil for (String, into: UnsafeMutableBufferPointer<Int8>) -> Int? indicates a success which could be a little confusing. We could slightly modify the buffer variant to take an inout Int? that could be set, but I'm open for any more suggestions.

Mordil · July 2, 2019, 5:08pm

Could a tuple work, or somehow a Result?

@discardableResult
public func demangle(
  _ input: String,
  into buffer: UnsafeMutableBufferPointer<Int8>
) -> (Bool, Int) // returns the success status, and the size needed which might always be 0