Xylem: A Pure Swift XML Parser

Xylem: A Pure Swift XML Parser

I wrote an XML parser in Swift. In 2026. On purpose. I named it after the tissue in plants that moves water through a tree — this one moves data through one.

Xylem targets UTF-8 XML 1.0 well-formedness with namespace-aware SAX/DOM parsing and XPath 1.0. It has tests. It has benchmarks. It even has a W3C conformance suite plugin. This is what I do for fun, apparently.

What you get

  • SAXParser, DOMParser, and XPath as separate modules on a shared XMLCore — use only what you need
  • Zero dependencies — no Foundation, no C, no iconv, no system libraries
  • No external entity resolution, no user-defined entity expansion — XXE-style resolution paths are out of scope entirely

The DOM parser uses a flat arena with no per-node heap allocation. The SAX parser is zero-copy and streaming. In informal testing, xylem bests both libxml2 and the xmloxide Rust crate across most workloads.

It deliberately does not do DTD validation semantics, external entities, XML 1.1, non-UTF-8 encodings, XInclude, XSLT, XSD, RelaxNG, XQuery, or XPath 2.0+.

The repo is at github.com/compnerd/xylem. BSD-3-Clause licensed. The library requires Swift 6.2 for Span and non-copying types.

Feedback, issues, and incredulous stares all welcome.

32 Likes

Great, I will try to use it for my own Swift-based XML transformation library which uses a parser of my own which has some special features but is not the fastest. Most things that my own parser can do can be done in subsequent steps. Resolving internal parsed entities aka named character entities during parsing (with a call-back) is a feature that would be hard to miss, and I would need the definitions in the internal subset (most importantly entity declarations) to be parsed. Thanks for publishing this package.

Awesome! It would be interesting to expand your benchmarking to do some head-to-head libxml2 comparisons.

Can this be our alternative solution to Upgrade libxml2 dependency · Issue #5390 · swiftlang/swift-corelibs-foundation · GitHub? :smile:

1 Like

Yes, I'd love to see a better benchmarking solution. The current solution is not very portable either (e.g. Windows).

I'd be happy to see that :slight_smile:

2 Likes

I wanted to try out the package get an error which is already noted by @marcprux as an issue in the repository. With this error fixed, would you mind adding 0.0.x version tags? Thanks.

Concerning internal subset and entities, I am not clear if the parser (a) just "parses over" the internal subset and just does not do anything with it or (b) if the parsing fails if an internal subset is available? If (b), it is maybe not too complicated to implement (a) and then:

  • Add a function to protocol Handler to handle the internal subset, so at least a user could then parse the internal subset on her own?
  • Add a function to protocol Handler to handle entities (if they are not numerical character references which of course should be resolved during parsing)?

I think it would then be OK if in Handler+Defaults.swift the according stubs would just throw an error.

And when the user then wants to to read an external parsed entity:

  • I suppose SAXParser throws an error if there is not a single root element? I would be very convenient if the parser could also to be used to read external parsed entities which generally do not have a single root element, could the SAXParser have a mode where the single root condition could be dropped, so there could be non-whitespace text at the top level or several elements at the top be allowed?

So yes, we all dream of a reduced version of XML, but unfortunately there are applications out there which still use entity declarations, and I would not suppose that an XML parser for modern times should be able to handle all this in detail, but if the parser at least can give the user the opportunity to handle those specific issues on her own if really needed then the parser would very well be applicable in those case, too. As a backdoor for traditionalists, so to speak :wink:.

Concerning CDATA:

  • So func character(data:) handles CDATA sections, maybe the name of the function should be more explicitly e.g. func cdataSection(data:)?
  • I very much appreciate that CDATA sections can be handled separately from other texts, but I suppose in most cases what the user wants is that the content of CDATA sections is just reported as text, so I think the default should be to just report text and CDATA sections should just be reported as such via a special setting.

BTW very nice that comments can be handled.

Yeah, sorry about that - not sure how feasible it’d be to support windows - I would assume it’d be a non trivial exercise. How do people actually validate swift code in a windows environment iteratively today if not having real machines, requires purchasing windows license and run in a VM or what are people doing practically if based on macOS?

I’ve found profiling on VMs to be pretty unstable, and so don’t expect benchmarking to prove much better. Generally, running on physical hardware is needed so you can get access to hardware counters reliably. But, the licensing thing is just a normal requirement, just like you need Apple hardware if you want to test on macOS. One could similarly ask, what do you do if you want to test for macOS and only have Linux?

I asked sincerely to understand how people do it practically (the VM was just about considering porting benchmark to windows too if there were any reasonable ways to get a working environment - we are a zero-Windows company so we don’t have any machines available… but then we know. Better someone invested in the platform have a look then, thanks.)

1 Like

In terms of porting software, you could use a VM, but I don't know if you can use that to benchmark. If you need access to the hardware counters, the emulated CPU often doesn't provide the proper counters (or at least not the most modern counters). One thing that is also going to complicate things here is the architecture. ARM64 and X64 do not have the same architectural counters, so depending on how you do the benchmarking, you might require a x64 host or a arm64 host to get the right architecture for the VM.

Microsoft used to provide VMs, but I can't seem to find the link. They still provide an ISO for evaluation purposes that can be used for development.

UTM has some info on installing Windows and other operating systems on macOS.

1 Like

Sort of off-topic, but I feel compelled to commend you on what is, in my view, a masterstroke in name selection. Not only is that a great botanical analogy, but visually the word bears a resemblance to the abbreviation (all the letters are there in almost the same order), and it literally is an anagram of "XMLey"[1]. Good stuff!

I've not had reason to think about or reach for tools for parsing XML in many years, but this makes me wish I had an excuse to do so. Congrats on releasing it!


  1. Okay maybe that's a stretch, but still great IMO. ↩︎

5 Likes

I hope that you were often in the phloem when you wrote the package.

(I’ll get my hat)….

1 Like

I keep falling back to libxml2 for lenient HTML parsing. Could Xylem be relaxed to also parse HTML?