XML parsing with SwiftWasm crashes execution environment

I'm encountering what appears to be memory corruption using XMLParser from FoundationXML to parse XML payloads when executing via Wasm.

The payloads are SVG. The parser works great when executing natively on macOS, Linux, Windows, iOS, and Android (thanks @Finagolfin!). However, when built for Wasm and executing the parser with a .wasm bundle using Electron, the iOS built-in Wasm execution engine in WKWebView, Wasmtime on Android, or even just vanilla Wasmtime on macOS, I get a crash during, at the end of, or shortly after, parsing an SVG payload. I can get it to happen with simple regular XML payloads as well.

I haven't been able to pin down the issue to a specific spot, but it appears to be in either _CFXMLInterface or libxml2 itself, likely related to attribute parsing or memory management related to attributes.

I've reduced the issue to a small test project that only requires FoundationXML. However, the small test project does not always reproduce the issue, as it is generally invalid memory accesses after parsing (or SVG processing during parsing) that actually cause the crash. To make this reproducible, we've forked wasmtime and gotten its wmemcheck feature to work with Swift's memory allocation scheme. It shows the first memory access violation shortly after the event, when trying to print an element's attributes in the parser delegate's parser(_:didStartElement:) method.

Reproduction on macOS is as follows:

Install SwiftWasm 5.10.0 Release, then create a small executable package:

% mkdir xml-test
% cd xml-test
% /Library/Developer/Toolchains/swift-wasm-5.10.0-RELEASE.xctoolchain/usr/bin/swift package init --type executable

Edit main.swift to be:

import Foundation
#if canImport(FoundationXML)
import FoundationXML
#endif

let xml = "<a b='0' />"

let parser = XMLParser(data: xml.data(using: .utf8)!)
class ParserDelegate: NSObject, XMLParserDelegate {
    func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes: [String: String] = [:]) {
        print("parser(_:didStartElement:) elementName: \(elementName) attributes: \(attributes)")
    }
}
let parserDelegate = ParserDelegate()
parser.delegate = parserDelegate

let parseCompleted = parser.parse()
if !parseCompleted {
    print("Parsing failed")
}

Build the project with the SwiftWasm toolchain:

% /Library/Developer/Toolchains/swift-wasm-5.10.0-RELEASE.xctoolchain/usr/bin/swift build --triple wasm32-unknown-wasi

Run .build/debug/xml-test.wasm using e.g. wasmtime, likely will successfully print: parser(_:didStartElement:) elementName: a attributes: ["b": "0"]

Use this fork of wasmtime with its wmemcheck feature configured for use with Swift's memory allocation scheme:

Install Rust (% curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh)
% git clone https://github.com/cobbal/wasmtime
% cd wasmtime
% git checkout cobbal/swift-wasm-wmemcheck
% cargo build --features wmemcheck
% ./target/debug/wasmtime -W wmemcheck ../xml-test/.build/debug/xml-test.wasm 

Execution should fail with a stack trace and Invalid store at addr 0x3ab9b8 of size 4, though it appears to be stack corruption, so your result may vary.

This has been a tough nut to crack. Any help would be appreciated! @kateinoigakukun from my reading on the various SwiftWasm repos, this might be adjacent to something you've been working on.

1 Like

Do you know it to crash with just about any SVG (albeit randomly, I understand)? If not, could you provide a simple SVG that you have gotten to crash the project? I am curious to see if I can reproduce it.

Also does it crash when you’re not running in debug build? I know that the stacks get too big with debug binaries. I ran into this problem in projects unrelated to XML. At least, if I recall correctly, it was the stack getting to large (?) — now I am not sure I remember correctly, but it was definitely a stack related bug. I always debug in release mode to avoid it.

Its actually really easy to have happen, but only if compiling for Wasm. In the example above, the document is let xml = "<a b='0' />". That document demonstrably causes memory issues. One that actually crashes very consistently is:

<svg viewBox="0 0 100 100"><g><g><g><g><g><g><g><g><g><g></g></g></g></g></g></g></g></g></g></g></svg>

It will crash without the attribute, but having an attribute (or referencing the svg namespace with xmlns) seems to reproduce the crash more easily.

The test case we're using with our internal lib (name redacted out of an abundance of caution) looks like this:

import OurCompanySVGParserLib

let svg = """
<svg>
<g>
<g>
<g>
<g>
<g>
<g>
<g>
<g>
<g>
<g>
</g>
</g>
</g>
</g>
</g>
</g>
</g>
</g>
</g>
</g>
</svg>
"""

let rootSVG = try? OurCompanySVGParserLib.Parser.parse(svg).get()
print(rootSVG?.serialize() ?? "Could not parse")

1 Like

For context, here is a sanitized stack trace from the wasmtime wmemcheck tool. We're reasonably sure the redacted parts of the trace do not contain the code causing the memory issue.

Caused by:
    0: failed to invoke command default
    1: error while executing at wasm backtrace:
           0: 0x17648f6 - <unknown>!malloc
           1: 0xb4f308 - <unknown>!swift_slowAlloc
           2: 0xb4f786 - <unknown>!swift_allocObject
           3: 0x84128b - <unknown>!Swift._ContiguousArrayBuffer._consumeAndCreateNew(bufferIsUnique: Swift.Bool, minimumCapacity: Swift.Int, growForAppend: Swift.Bool) -> Swift._ContiguousArrayBuffer<A>
           4: 0x840be8 - <unknown>!Swift.ContiguousArray._createNewBuffer(bufferIsUnique: Swift.Bool, minimumCapacity: Swift.Int, growForAppend: Swift.Bool) -> ()
           5: 0x83ebcf - <unknown>!Swift.ContiguousArray.reserveCapacity(Swift.Int) -> ()
           6: 0x840744 - <unknown>!(extension in Swift):Swift.Collection.map<A>((A.Element) throws -> A1) throws -> [A1]
           7-26: <Company SVG Parser Calls, with about 10 being recursive, delivered to the parser delegate>
          27: 0x15865c0 - <unknown>!FoundationXML._NSXMLParserEndElementNs(_: Swift.OpaquePointer, localname: Swift.UnsafePointer<Swift.UInt8>, prefix: Swift.UnsafePointer<Swift.UInt8>?, URI: Swift.UnsafePointer<Swift.UInt8>?) -> ()
          28: 0x1589436 - <unknown>!@objc FoundationXML._NSXMLParserEndElementNs(_: Swift.OpaquePointer, localname: Swift.UnsafePointer<Swift.UInt8>, prefix: Swift.UnsafePointer<Swift.UInt8>?, URI: Swift.UnsafePointer<Swift.UInt8>?) -> ()
          29: 0x15fd722 - <unknown>!xmlParseEndTag2
          30: 0x160c4ed - <unknown>!xmlParseTryOrFinish
          31: 0x1608643 - <unknown>!xmlParseChunk
          32: 0x158a716 - <unknown>!_CFXMLInterfaceParseChunk
          33: 0x158a003 - <unknown>!function signature specialization <Arg[0] = Exploded, Arg[2] = Owned To Guaranteed, Arg[3] = Owned To Guaranteed> of function signature specialization <Arg[1] = [Closure Propagated : closure #3 (Swift.UnsafeRawBufferPointer) -> Swift.Int32 in FoundationXML.XMLParser.parseData(_: Foundation.Data, lastChunkOfData: Swift.Bool) -> Swift.Bool, Argument Types : [FoundationXML.XMLParserFoundation.DataSwift.Bool]> of generic specialization <Swift.Int32> of Foundation.__DataStorage.withUnsafeBytes<A>(in: Swift.Range<Swift.Int>, apply: (Swift.UnsafeRawBufferPointer) throws -> A) throws -> A
          34: 0x158828b - <unknown>!function signature specialization <Arg[2] = Owned To Guaranteed> of function signature specialization <Arg[0] = [Closure Propagated : closure #3 (Swift.UnsafeRawBufferPointer) -> Swift.Int32 in FoundationXML.XMLParser.parseData(_: Foundation.Data, lastChunkOfData: Swift.Bool) -> Swift.Bool, Argument Types : [FoundationXML.XMLParserFoundation.DataSwift.Bool]> of generic specialization <Swift.Int32> of Foundation.Data.withUnsafeBytes<A>((Swift.UnsafeRawBufferPointer) throws -> A) throws -> A
          35: 0x1588161 - <unknown>!FoundationXML.XMLParser.parseData(_: Foundation.Data, lastChunkOfData: Swift.Bool) -> Swift.Bool
          36: 0x15885c4 - <unknown>!FoundationXML.XMLParser.parse(from: Foundation.Data) -> Swift.Bool
          37: 0x158872b - <unknown>!FoundationXML.XMLParser.parse() -> Swift.Bool
          38: 0x3dc1fe - <unknown>!static OurCompanySVGParserLib.Parser.parse(Foundation.Data) -> Swift.Result<OurCompanySVGParserLib.Element.SVG, OurCompanySVGParserLib.Parser.Error>
          39: 0x3df0ef - <unknown>!static OurCompanySVGParserLib.Parser.parse(Swift.String) -> Swift.Result<OurCompanySVGParserLib.Element.SVG, OurCompanySVGParserLib.Parser.Error>
          40: 0x81416d - <unknown>!main
          41: 0x176986f - <unknown>!main
          42: 0x176880a - <unknown>!__main_void
          43: 0x1768750 - <unknown>!__original_main
          44: 0x2fdfe - <unknown>!_start
          45: 0x17876f7 - <unknown>!_start.command_export
       note: using the `WASMTIME_BACKTRACE_DETAILS=1` environment variable may show more debugging information
    2: Double malloc at addr 0x8c7fb8 of size 480

I'm not getting a crash myself, but I also realize that I am using the Ubuntu build for SwiftWASM and you're using a mac build, same version though.

I increased the size of the SVG and also doubled and tripled the recursion for embedded elements. I also tried calling it over 100 times in a loop. I did this using swift run carton dev which runs in a local server in (I believe) debug mode.

If you're able to make a small swift file that reproduces the bug, I would be more than happy to look at it.

I am curious because I was considering moving to FoundationXML instead of my own parser and I would like to make sure that this works.

You might also have a look at another XML library which is succesfully used at a large organisation for some time now and is about to be published in a first final version soon (I am the main author of the library).

Huh. I'll try out the Ubuntu build and see if I get different results. I'd dearly hope that the OS the compiler runs on does not affect the result, but at this point I'm happy with any solution!

I've been using Wasmtime, WKWebView, and other non-interpreters for the performance they provide. It would be interesting if a Wasm interpreter did not exhibit crashes.

The <svg viewBox="0 0 100 100"><g><g><g><g><g><g><g><g><g><g></g></g></g></g></g></g></g></g></g></g></svg> has been crashing so consistently when used with Wasmtime I didn't dig deeper (even <svg><g><g></g></g></svg> had a high crash rate). I'll play with swift run carton dev and see if I can't find a payload that exhibits issues there.

FoundationXML's XMLParser has its issues, but being a thin wrapper around libxml2 does make it a compelling choice. If not for this Wasm issue, we'd have little reason to look elsewhere. It'll probably be a good move for you, if we can smooth out this wrinkle.

I'll respond with a payload that crashes when executing via swift run carton dev if I can find one. If you have a minute in the mean time, would you mind confirming the crash in a Wasmtime execution environment with your setup @austintatious? I have 4 devs that have been reproducing it consistently, but I'd love to know if we have some kind of platform or environment bias here!

We've looked at a few other XML solutions, but almost all use FoundationXML under the hood, which does not help with this issue. I'll definitely check this out, thanks!

No FoundationXML involved, pure Swift.

1 Like

I've had a difficult time finding an input that crashes a simple XMLParser program that does not use our SVG parser, but I've found a configuration that does. By parsing an XML payload to an in-memory tree, then printing that tree using a recursive method, the crash can be reliably reproduced with both wasmtime and carton:

import Foundation
#if canImport(FoundationXML)
import FoundationXML
#endif

// Define an xml payload with nested elements
let nesting = 99
let xml = "<a>\(Array(repeating: "<b>", count: nesting).joined())<c d='0' />\(Array(repeating: "</b>", count: nesting).joined())</a>"

// Define a simple tree of in-memory XML elements and a recursive method to print them with indentation
struct Element {
    var name: String
    var attributes: [String: String]
    var content: String
    var children: [Element]
    
    func serialized() -> String {
        serializedLines(indent: 0).joined(separator: "\n")
    }
    
    func serializedLines(indent: Int) -> [String] {
        // Recursively obtain the serialized lines for child elements, then print the element and its children
        let i = String(repeating: "    ", count: indent)
        let childrenLines = children.flatMap { $0.serializedLines(indent: indent + 1) }
        if childrenLines.isEmpty {
            // Element has no children, so print the element and content on a single line
            return ["\(i)<\(([name] + attributes.map { "\($0.key)=\"\($0.value)\"" }).joined(separator: " "))>\(content)</\(name)>"]
        } else {
            // Element has children, so print start and end elements on separte lines, with content and children between
            return
                ["\(i)<\(([name] + attributes.map { "\($0.key)=\"\($0.value)\"" }).joined(separator: " "))>\(content)"] +
                children.flatMap { $0.serializedLines(indent: indent + 1) } +
                ["\(i)</\(name)>"]
        }
    }
}

// Stack of elements to populate during parsing, with a single element expected at the end of parsing
var elements: [Element] = []

// Parse the input XML into an in-memory tree
let parser = XMLParser(data: xml.data(using: .utf8)!)
class ParserDelegate: NSObject, XMLParserDelegate {
    func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes: [String: String] = [:]) {
        // Push a new element onto the stack with attributes and empty collections for content and child nodes
        elements.append(Element(name: elementName, attributes: attributes, content: "", children: []))
    }
    func parser(_ parser: XMLParser, foundCharacters string: String) {
        // Aggregate content associated with the active element
        elements[elements.indices.last!].content += string
            .replacingOccurrences(of: "\n", with: " ")
            .split(separator: " ")
            .joined(separator: " ")
    }
    func parser(_ parser: XMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
        // Pop the element off the stack and either store it in its parent, or declare it the root
        let element = elements.popLast()!
        if !elements.isEmpty {
            // Store the element in its parent
            elements[elements.indices.last!].children.append(element)
        } else {
            // Declare the element the root element
            elements = [element]
        }
    }
}
let parserDelegate = ParserDelegate()
parser.delegate = parserDelegate
let parseCompleted = parser.parse()

// Print the serialized result (a properly formed document should result in an element stack with a single element containing the root)
guard parseCompleted, elements.count == 1, let serialized = elements.first?.serialized() else {
    fatalError("Parsing failed")
}
print(serialized)

This looks on the surface to be a stack overflow during printing, but further investigation proves that is not the case. It appears to be stack corruption that happens while parsing the XML payload, and that corruption only manifests as an issue when subsequent operations are highly recursive (like the sample above or the SVG parser we've been developing).

Again, the very simple XML payload and program in the initial post performs a store to unallocated memory, and can be detected every time if using a tool that checks memory accesses like wasmtime -W wmemcheck. I just have not been able to track down the code in libxml2, _CFXMLInterface, or XMLParser itself that actually corrupts the stack. I'd very much appreciate any help or insight anyone can give on this!

Wasmtime has a 480 stack size limit, I believe based on the really small research I did on my phone. I think that is your culprit.

I compiled with different stack size settings and got similar results with and without recursion subsequent to parsing. It does seem to be memory corruption, not stack overflow, unfortunately. The recursion code just reliably reproduces a crash, revealing that memory corruption occurred previously.

Running with wasmtime -W wmemcheck indicates that an unallocated memory region is being written into even with the simple <a b='0' /> test case from the original post.

So you cannot use FoundationXML at all with WASM? Note that FoundationXML uses the libxml2 C library, to my understanding its sources also have to be compiled to WASM, maybe this is missing as part of the toolchain?

Correct, parsing of any non-trivial XML document in a Wasm context causes memory corruption that will eventually cause some kind of crash.

The libxml2 library is being compiled to Wasm and is being executed. You can see in the stack trace above that both _CFXMLInterface and libxml2 have functions called:

I'm working to find a solution to the issue, likely by modifying either _CFXMLInterface or libxml2 itself. Its slow going, as my experience with Wasm debugging tools is small, and memory corruption behavior only present on one of many platforms is not a simple bug to quash. I've reached out to @kateinoigakukun for help and hope to rely on their evident experience with both Wasm and Wasm debugging. That said, many eyes help in finding bugs!

1 Like

If you haven’t done so yet, file a bug report here Issues · swiftwasm/swift · GitHub

Done! XML parsing with SwiftWasm crashes execution environment ¡ Issue #5595 ¡ swiftwasm/swift ¡ GitHub

1 Like

We may have found a solution. It appears that we had a single call in our parser that had a stack allocation over 70KB! It is a pretty large function, but far from our largest. It wasn't even being called recursively. If this proves to be the issue, then XMLParser and libxml2 are in the clear.

However, the binary produced by SwiftWasm did not offer any facility to know that a single function call allocated more of Wasm's machine stack than was available, and memory was silently corrupted. So there may still be work that needs to be done to at least warn of this situation, or tools to allow developers to see that this is happening.

We will also be upstreaming the changes we made to wasmtime's wmemcheck tool so it can be used to help debug memory issues in the context of Swift's memory allocation scheme.

3 Likes

I wanted to bring this thread to a close for anyone who finds themselves in a similar situation. Thanks to @austintatious, @sspringer, and @kateinoigakukun for your help with this!

The original post provides a small sample program that uses XMLParser to parse a simple XML payload. It directed use of wasmtime and its wmemcheck feature to show that parsing the small payload was causing memory corruption:

This was incorrect. We had not sufficiently prepared the wmemcheck feature with the environment needed to deal with every avenue that Swift uses to allocate memory. We had the tool working for several simple string manipulation programs, which executed without issue, and assumed we had covered the differences between Swift and Rust sufficiently to detect invalid memory accesses. Further inspection revealed that, in addition to malloc, calloc, realloc and free, Swift and Foundation make use of posix_memalign, malloc_usable_size and aligned_alloc in other contexts. These were not being included in the memory checks, and so XMLParser, _CFXMLInterface, and/or libxml2 appeared to be at fault. We are in the process of re-contributing our changes to wasmtime so others can benefit:

Ultimately, our own implementation of a type conforming to XMLParserDelegate was at fault. Refactoring the delegate methods, converting an instance of recursive parsing to manual stack parsing, and using an indirect enum for our output tree type, reduced the stack usage of the parser sufficiently for use with the default Wasm build settings. Though, anyone finding this later may want to experiment with varying stack sizes as well:

https://book.swiftwasm.org/getting-started/troubleshooting.html#3-stack-overflow-is-occurring

Hopefully the contributions to wasmtime will be useful to others. If anything specific to the SwiftWasm project can be learned from this, the fact that this issue was not immediately apparent as a stack overflow is telling. It was surprisingly easy to have a non-recursive method overflow the stack, and that manifesting as stack corruption (and even executable code corruption on our case!) instead of a page fault, is a foot-gun.

Wrapping from memory location zero to 0xFFFFFFFF, and causing a page fault, is one solution, if the stack is situated there. I tried using the stack-first option that carton uses to get this functionality, but did not find that the option worked in my case, and it is not the default for the SwiftWasm toolchain, so it would only help a user if they already suspected stack overflow anyway.

The carton library is a great getting-started tool, but ultimately the magic of Wasm is that it can be executed in many different environments, so more tooling to support debugging those execution environments when using Swift instead of Rust is key to maturing the toolchain. Maybe those tools exist, and I just didn't find them. If anyone has Wasm/Wasi tooling for debugging production Swift code that is compiled to Wasm, please do tell!

6 Likes

Thank you for writing this up! That's a really interesting story.
Also wmemcheck improvements are definitely helpful for me, thanks for your upstreaming work!

Agree. Given that other language drivers pass -stack-first by default, it might be considerable to do it in swift-driver too.