Parsing performance and exclusivity checks

I am working on a parser that reads large property list files (OpenStep/old-style plist, larger files around 20 MB to 200 MB). Speed is of paramount importance. I am only interested in certain top-level key–value pairs, so I don’t want to allocate strings/arrays/dictionaries for subtrees I am not interested in, which is why I don’t use existing parsers like PropertyListSerialization.

Inspecting the process (Release profile) in Instruments, I see that almost half of the time is spend in swift_beginAccess and swift_endAccess with many swift::runtime::SwiftTLSContext::get() in between:


My data type is an enum:

public enum PropertyList {
	case string(String)
	case data(ContiguousArray<UInt8>)
	case array([PropertyList])
	case dictionary([String: PropertyList])
}

and the parser is a non-copyable struct (a separate package of mine):

public struct Parser<Subject: Collection>: ~Copyable {
	// [...]
	
	/// The collection that is parsed by the parser.
	public let subject: Subject
	/// The index of the current element, which is the next element to be parsed.
	public var position: Subject.Index
	
	// [...]
}

I can verify that the runtime exclusivity checks are causing a lot of this slowdown by disabling them (SWIFT_ENFORCE_EXCLUSIVE_ACCESS = off). An example run is 0.352 seconds with run-time checks and 0.188 seconds without (using XCTestCase.measure). Test setup: 35 MB file, filtering for a single key – that is, only the value of a single top-level key–value pair was parsed and allocated, the other top-level key–value pairs where just parsed syntactically without allocating arrays/dictionaries.

Are there ways to opt-out of these run-time checks in favor of more compile-time checks? I am not sure which type is responsible for these access checks and how best to address them.

Standalone code and sample file:

https://formkunft.com/b/share/PropertyListParsing-File.plist.zip

The thing that immediately stands out to me is that there is no obvious need for exclusivity checks in the code you linked at all—at a glance, I see no global variables, escaping closures, or classes involved (but I could have missed one). Do these access markers come from predictable backtraces? It would be interesting to see where they're coming from.

3 Likes

Thanks for having a look! Your mention of escaping closures was the hint I needed as it made me look at the parsePropertyList function that I have defined inside the init. Moving it out of the init and passing the parser object as an inout parameter solves the issue.

Before:

After:

The updated code is here, if anyone is interested:

2 Likes

At line 130 there is a local function that is accessing a local var. This is currently resulting in exclusivity checks which could be the cause of this. You can try adding @inline(__always) and see if it goes away or make it an extension on the Parser.

You can get some diagnostics on this by adding the @_assemblyVision attribute to the local function. You will see a diagnostic emitted for each time the local parser variable is access that says:

Begin exclusive access to value of type 'Parser<UnsafeBufferPointer>'
of 'parser'
End exclusive access to value of type 'Parser<UnsafeBufferPointer>'
of 'parser'

Here is also a reduced version that shows this problem: Compiler Explorer

5 Likes

Good catch. The local function is only used in direct invocations, so it would be a nice compiler improvement if the compiler could eliminate the boxing and exclusivity checking overhead.

4 Likes

Do you mean like this:?

@inline(__always)
func parsePropertyList(
	keySubset: Set<String>?,
	isSkipping: Bool,
) throws(PropertyList.ContentError) -> PropertyList {
	// ...

That does not improve the runtime and still has swift_beginAccess in the heaviest stack trace. Could be due to the recursive nature of parsePropertyList, but I don’t fully understand this attribute and whether it is also applicable to recursive functions.

I’ll look at using @_assemblyVision for diagnosing these kinds of issues. Thanks for sharing the Compiler Explorer sample, I will play around with that.

You could use -c release -Xswiftc -enforce-exclusivity=unchecked to take care of the problem. A few years ago, in some gRPC benchmark this allegedly improved the perf by 30% IIRC. I frequently see swift_begin/endAccess showing up very prominently and honestly I don't think it holds its weight for performance-sensitive production code.

2 Likes

Yes exactly. The recursive nature could explain that this still isn't inlined.

My understanding of the issue is that these local functions can in theory escape which in turn causes the local variables captured by the function to be boxed and heap allocated which in turn requires dynamic exclusivity checks.

Moving that function to an extension as a mutating method should solve it though. Exclusivity is still required but that can usually be determined statically at compile time.