Performance annotations

Hacksaw · January 5, 2022, 10:16pm

You might want to lock an inout?

tbkka · January 5, 2022, 10:46pm

We really don't have a lot of ideas here, so would love concrete suggestions. One idea we've had is to automatically convert some globals to be eagerly initialized to avoid the possibility of a lock in annotated code. But I suspect once people start trying to use these annotations, we'll find much more pressing things that we should be exploring instead. I just don't know what those might be yet.

The big question seems to be what kind of container types such code should use. Is it possible to make the standard library types safe for such use (probably not), or do we need to be exploring alternatives with more limited functionality?

tbkka · January 5, 2022, 11:16pm

Karl:

I can imagine that we might like to include more of these in future, so I think a combined attribute would work better:
@performanceAssertion(noLocks, noFatalErrors, /*... etc */) 
func myFunc(...)

Honestly, it's not clear to me that there really are that many more such possible options, so I'd love to hear more detail about what additional possibilities you think there are and how you think those would behave and/or interact.

Re: noFatalErrors. This seems impractical to me. For example, would that have to prohibit any kind of array access, since it might go out of bounds?

Do you think there's a need for code sections that prohibit stack and/or arena allocations? Local variables and function calls technically allocate on the stack, which seems unusably restrictive. If there's really "no way to prove things like arena allocations won't happen", then it would seem impractical to try to provide such an annotation.

Do you have other ideas?

tbkka · January 5, 2022, 11:31pm

Joe has put together a road map that pulls together a number of related ideas and how they fit together:

tera · January 6, 2022, 12:22am

Something like "@performance(attribute1, attribute2)" might work here. Although I'd not add I/O (disk or network) at this point as I can't think of any real world example when I'd like to use it. Note that a mere memory access (no locks / no allocations, no file read/write, etc) can cause I/O (due to virtual memory paging).

Please rephrase these two, it is not clear what you mean here.

No.

But... You can implement a busy waiting loop that will act like a normal lock (performance aspects aside)... And you can do this (at least in principle) even without using atomics.. So there'll be some loopholes the annotation system won't be able to catch (unless you prohibit loops, etc, see below).

Sure.

What are "once like flags"?

Honestly no idea.. On the surface it shouldn't need locks but particular implementation might use them for some reason - that way or another the particular implementation will have a proper annotation matching that implementation. Or vice versa - once you settle on the annotation you want (annotation is part of API) the implementer will ensure implementation adheres.

Don't think so.

The practical way to determine correct attributes can be the following: during development define the opposite (positive) attributes -- "__locks", "__allocations" -- and mark the relevant "leaf" functions like malloc, pthread_mutex_lock, etc appropriately. Anything that calls them will now error out, so you'll mark the callers appropriately as well. Then anything that calls those callers, and so on. At the end of the day this will "infect" the whole source base and you'll have all functions annotated, just reverse to what we want (¹) - so as the final step reverse all annotations.

(¹ The purist in me would actually prefer these "positive" annotations -- "locks", "allocations" -- but that's harder to pitch.)

I think the original phrase in pitch is correct based on this boolean logic:

"allocations imply locks"  ==> "no locks imply no allocations"

I'd say we shall keep it simple and not consider this case. (if we encounter them in practice we'd probably mark them with a "stronger" noLocks.

From realtime audio programming perspective noLocks (which also implies noAllocations) would be enough for practical purposes. Prohibiting loops, etc would be too restrictive. (If we were considering life support systems or avionics, etc - that would be a different story.)

All IMHO above.

odmir · January 6, 2022, 1:32am

I recently watched a talk about realtime programming in C++ for audio applications and I remember them mentioning exactly this (at 30:29 in the video) and that in some, high performance, realtime cases there are two solutions: either have a dedicated thread whose only job is to keep touching the memory used by the audio processing routine to keep it from being paged out; or have the memory needed by the routine be locked into memory using OS provided APIs. Is there something Swift could do to facilitate this usage, or are the OS provided APIs the best option?

tera · January 6, 2022, 1:54am

Normally what happens is that in the audio thread itself you are accessing the same memory locations again and again (even if you are changing parameters like filter coefficients, etc) and thus keep those areas in RAM (and in most cases in cache) so there is nothing else to do. It's only when you switch the pipeline and start accessing different areas paging (and most likely glitch) will occur. (1)

We can get similar behaviour with "escape hatch":

@noLocks
func audioProc() {
    buffer = malloc() // error, can't do this
    if firstTimeOrDoneInfrequently {
        unsafePerformance { // be responsible, do not abuse
            buffer = malloc() // "ok" to do this, say on "pipeline change"
        }
    }
    ...
}

Edit: (1) Having said that it is not unimaginable to have a use case when you can't tolerate glitches even on pipeline changes.. in those case, yes, the mentioned workarounds are probably the only way to go and for memory pinning you'd have to go to the system.

ksluder · January 6, 2022, 1:56am

I would expect OS APIs are always the best option for this. As far as I know, all OSes already maintain the fiction that memory access is synchronous. Most CPUs even maintain this fiction. Trying to build another fiction of a platform-agnostic paging management scheme atop this existing shared fiction seems fraught with peril.

odmir · January 6, 2022, 12:45pm

Understood, makes sense. I hope the talk linked above is still relevant and helps this discussion.

Aside

Just to clarify where this might be absolutely necessary: the kind of applications that require this are the sort where audio glitches cannot be tolerated and where the memory required might be quite large and thus very prone to be paged out. An example of this is audio sampler applications for live performances, such as a drum kit/keyboard etc. where the memory usage might be quite high due to a large amount of audio samples that need to be played very responsively when the user presses a key or something like that. In this case the samples need to be available as soon as possible and there cannot be audio glitches whatsoever or you risk making everyone in your live performance deaf.

lukasa · January 6, 2022, 12:47pm

No huge note from me except that in general I think this looks great. As a library developer, the API breakage impacts make me nervous, but I suspect that mostly I'd be using these annotations within a codebase, not exposing them across an API boundary.

A common use-case we have is to have a "hot-path" that should not allocate, and cold paths where it may. The two options here seem to be either to mark the cold path as unchecked (a bit weird) or to section off the hot path into a function. Has thought been put into having a scope-based checking option as well? That is, in addition to having an unsafePerformance (or whatever name there is here) block, we might also want a noAllocations block that we can use within a function that is not @noAllocations.

lukasa · January 6, 2022, 12:57pm

Actually, I think I can just write one myself:

@noAllocations
@inline(__always)
func noAllocations<ReturnValue>(_ body: () throws -> ReturnValue) rethrows -> ReturnValue {
    try body()
}

The other problem we have here is about supporting older versions. It's very hard for us to add annotations to functions because Swift does not allow us to use #if compiler around annotations: we have to do it around entire functions. Has any thought been given to adding that support here?

Avi · January 6, 2022, 12:58pm

Perhaps this will lead to Swift finally getting a fixed-length array.

johannesweiss · January 6, 2022, 1:11pm

Thank you @Erik_Eckstein. This is awesome and exactly what I've wanted to see in Swift for a number of years and I'm still convinced this is the right way to go. It doesn't magically solve all the performance predictability problems but it's a very important for step.

A few notes:

I don't think unsafePerformance is a good name because *unsafe* usually means it may introduce memory unsafety or is undefined behaviour. For the performance annotations this is not the case, they merely make a few compile-time-only annotations potentially untrue. Maybe uncheckedPerformance or something could work?
I'd probably suggest the annotations to be something like @performance(noAllocations, noLocks) because it feels that this will scale better when new annotations will be added.
It'd be awesome if we could find a syntax that too-old compilers just ignore. The problem with introducing new attributes is that it'll compiler-error on older Swifts. Together with the huge mess we're in with #if compiler only being able to #if-out whole declarations I think that's fairly bad because many codebases have to support Swift compilers that are quite old.
One compromise could be to introduce @performance(...) with the next Swift compiler in a way that just ignores any unknown attribute in there. So @performance(someBadAttribute) would just work without causing issues.
A probably better alternative would be that we fix #if to work anywhere, like in C. Then we could actually write
```
#if compiler(>=6.0)
@performance(noAllocations)
#endif
func foo() { ... }
```
Without causing trouble. This will compile just fine in all existing Swift versions (because they ignore anything in #if compiler(>=6.0) that looks vaguely sensible and newer compilers could do what we all think this code does anyway.

It'd be cool to support lightweight (without named funcs) scoped performance annotations for closures so we can write

func foo() {
    doSomethingThatAllocates()

    @noAllocations // or @performance(noAllocations)
    {
        thisDoesNotAllocate()
        neitherDoesThis()
    }()

    doSomethingThatAllocates()
}

Jean-Daniel · January 6, 2022, 5:15pm

Isn’t it supposed to fails to compile because it is unsafe to call an arbitrary closure in a no allocation annotated function ?

tera · January 6, 2022, 5:39pm

I assume that in this pitch unannotated functions are implicitly annotated as "withAllocations" and "withLocks" (in other words have neither "noAllocations" nor "noLocks"), is it correct assumption?

lukasa · January 6, 2022, 5:56pm

Probably, but it doesn't meaningfully affect the outcome. Imagine the closure argument was appropriately annotated as well.

Nikolozi · January 6, 2022, 7:43pm

Wonder if noDenormals option would make sense and/or could be compiler enforced.

tbkka · January 6, 2022, 8:11pm

noDenormals is an interesting idea. I don't see how the compiler could verify this (apart from checking that FP constants are not denormals), but an environment in which denormals were flushed to zero might be useful for some applications. That would probably make more sense as part of a floating-point environment control of some sort:

@fp(roundUp, flushToZero) {
   ... calculations that need a special FP environment ...
}

Joe_Groff · January 6, 2022, 9:14pm

We've talked about having a separate set of floating-point operators or functions with different semantics in order to address such "fast math" sorts of use cases.

Philippe_Hausler · January 6, 2022, 11:17pm

Would it perhaps be possible to leverage a similar approach as result builders (the idea of connecting the AST to function calls) to have a higher ordered linguistic function scope definition? For example in the case of no runtime allocations - that the compile pass would see one of these attributions and then execute at compile time parts of the code as a pseudo extension to the compiler. That way we could build specializations like these for all sorts of things, not just one of paint bucket effects like no allocations, or no locks etc, but a full control of a subset of the AST.

Ruby has some functionality that does wild stuff like that; where it can even control the parser - what I am suggesting is not as widely scoped as that, but something more in the spirit of Swift's strong type system.

This might be too lofty of an idea to pursue for something so specifically targeted, but maybe it is an indication that since we can easily think of numerous useful cases that a longer term goal might be a good consideration here.