How do I optimize compile times?

Lua · May 19, 2024, 1:12pm

I have a relatively small project, maybe 20 files or so, mostly 400-700 lines in length — But the compile times are already consistently over 5 seconds, and the LSP is completely frozen more often than not.

Is there a guide on which Swift features will exponentially hurt compile times? Or at least a way to make Debug mode (and sourcekit-lsp) usable?

It's not completely unreasonable, I used Zig for this project before and it also takes about 4 seconds in Release mode (they're at a similar level of complexity now)

However Zig can compile in 0.7s in Debug mode in usable state, whereas Swift's Debug mode is useless, it's not optimizing away some Range existentials in for loops which actually takes the framerate from stable 120fps to a slide show.

vns · May 19, 2024, 1:54pm

LSP can stop working itself eventually, at least in nvim I get it quite often. In Xcode autocomplete is more likely freezes if there are some complex type inference, so I’d suggest to inspect this. But in general over 5sec compile time isn’t huge for Swift.

Lua · May 19, 2024, 2:49pm

With this little code? At the rate it's slowing down I'll be waiting half a minute and have no code suggestions in a week or two

So I can't do anything without adding explicit types everywhere?

Lua · May 19, 2024, 3:21pm

I tried moving a large chunk of my code to another module hoping it would help, but it didn't, and oh my the code is extremely slow.

I heard that moving things into modules was bad, but I didn't think this bad — It was able to sort all the vertices in a few seconds after starting, but now it's many, many times longer.

vns · May 19, 2024, 3:26pm

Depends on many details, but it might be the case as the project grows — at least part with build time. I am overly biased having experience with most of the really large projects being in Apple’s ecosystem or using other build systems that has their complexities, so I cannot say how this correlates with other languages. Rust, btw, has the same issue with compile time. That is a trade off of such languages in some way, I suppose.

As for autosuggestions, LSP sometimes just dies for Swift and comes back after restart. Xcode lives on its own and has different history of annoying issues. But in general, it suggests more often than not

Not everywhere, but the more the better here. Annotations help compiler to resolve complex cases faster. Array/dict literals one of the examples where explicit type is helpful. Also, closures more often will benefit from explicit types (for example map or compactMap). There is no need to write them everywhere explicitly, but certain cases can affect it.

wadetregaskis · May 19, 2024, 9:26pm

You can mitigate this somewhat by annotating anything performance-sensitive as @inlinable (although in practice I find I have to use @inline(__always) quite a lot to actually get the correct behaviour from the compiler, as it seems far too hesitant to inline things).

But that will be counter-productive to reducing your compile times, since now you're basically pulling all the code back into your original module(s), just in a more round-about way.

Your best bet is probably not to pull things out into separate modules or otherwise do major refactors, unless you have code that'd already benefit from that, but rather to profile the compilation and address the specific hot spots. There's various guides available on how to do that.

Lua · May 19, 2024, 10:31pm

Thanks for the function time command, very useful
For the most part there's nothing that stands out, but then there's these:

public extension BinaryInteger {
    func normalized(from: ClosedRange<Self>, to: ClosedRange<Self>) -> Self {
        (to.upperBound - to.lowerBound) / (from.upperBound - from.lowerBound) * (self - from.lowerBound) + to.upperBound
    }
}

public extension FloatingPoint {
    func normalized(from: ClosedRange<Self>, to: ClosedRange<Self>) -> Self {
        (to.upperBound - to.lowerBound) / (from.upperBound - from.lowerBound) * (self - from.lowerBound) + to.upperBound
    }
}

I am really curious why these simple functions would take half a second each, over 10 times longer than the next slowest, a much longer function for generating terrain vertices...

Nevin · May 19, 2024, 10:57pm

The Swift type-checker is notoriously slow for expressions that involve many operators (and also many literals, though that’s not the case here). If you re-write them to use a sequence of simple expressions, it should compile much faster:

public extension BinaryInteger {
  func normalized(from: ClosedRange<Self>, to: ClosedRange<Self>) -> Self {
    let toSpan = to.upperBound - to.lowerBound
    let fromSpan = from.upperBound - from.lowerBound
    let ratio = toSpan / fromSpan
    let position = self - from.lowerBound
    return ratio * position + to.upperBound
  }
}

(Should it be “+ to.lowerBound” at the end though? That seems like it would make more sense…)

Jumhyn · May 19, 2024, 11:00pm

The issues with operator type-checking aside, another variable worth being aware of is that because the compiler does a lot of caching, ‘time to compile function’ may also be nontrivially influenced by order of compilation. Not sure if there’s metrics out there which illustrate the magnitude of this effect.

dima_kozhinov · May 19, 2024, 11:06pm

Does Swift Package Manager allow to change the order of compilation?

Lua · May 20, 2024, 8:55am

I think you're right, thanks...
I have not actually used these functions yet, I just copied the expression (and converted it to use ranges) from some pico-8 code where I used it for scrollbars. I must have been very tired when writing this

That’s interesting, not where I expected it to be slow.

Could Swift avoid compiling unused code every time somehow? I am implementing a lot of things that aren’t immediately used anywhere like various lazy operations on 2d color collections, and so it's a little unfortunate that they are all contributing to compile times.

It seems like commenting out half my code for the time being would save me time, which is a very weird workflow. Without these functions my compile times seem evenly distributed so I don't think anything else would help.

Lua · May 20, 2024, 6:02pm

Ok, apparently I get the same result just running in debug mode. I decided to measure it; it may be off by half a second or so, I just ran it with time and waited for the window to appear, but the difference is bad enough it doesn't matter:

swift run  142.94s user 0.49s system 99% cpu 2:24.44 total
swift run -c release  1.46s user 0.43s system 68% cpu 2.751 total

wadetregaskis · May 20, 2024, 7:26pm

Are you able to share your code? It'll be easier for others to investigate if they can reproduce the issue locally.

stackotter · May 21, 2024, 2:09am

I'd be fascinated to have a go profiling this if the code is available anywhere (or if you make a small reproducer for this massive performance difference). I'm interested where most of the time is spent and what release mode optimisation is making such a big difference.

stackotter · May 21, 2024, 2:22am

I've been working on Delta Client for quite a while now (a Minecraft Java Edition rewrite in Swift) and I can say that I've faced very similar issues. I eventually gave in to only using debug builds when I'm working on the UI not the renderer (because if I try and actually use the game part in debug mode everything's a crawl). On my old Intel mac, clean release builds used to take almost 10 minutes, and cached release builds took close to 2 minutes, making working on the renderer extremely annoying. On my M2 macbook air clean release builds now take 45 seconds and cached release builds take around 15 seconds, but it's still not ideal.

This is obviously less than ideal because it makes debugging a pain, especially when debugging dead locks (cause things slow down so much in the debug builds that low probability dead locks just stop occuring). Debug builds are definitely an area that needs some work in Swift (perhaps having a performance debug mode where the compiler is allowed to do more optimisations at the expense of compile, but only optimisations that don't negatively affect debugging, e.g. it could probably do specialisation, but inlining would be disabled unless swift makes some new system for tagging inlined functions or something).

It's at the point where I often opt to use release builds even when developing the UI, because loading resources and models takes 8-10 seconds in debug mode even though it's closer to 200ms in release mode. This could probably be combatted through hot reloading (which avoids reloading resources between UI iterations), but that still wouldn't help if I was working on the resource loading code...

Tl;dr debug mode is often unusable for medium-sized performance sensitive applications where algorithms are implemented in-house/in-swiftpm (and can't benefit from being in pre-built optimised dylibs like algorithms in Apple's frameworks do)

Lua · May 21, 2024, 7:30pm

I can do that At the moment it's not very far in development and I was planning on doing that eventually anyway.

Here is the GitHub link

I also have a screenshot of the profiler from yesterday:

It's very much just a long chain of Swift not optimizing away or inlining generics. Not optimizing vectors is not going to compare favorably to release mode where when I checked using compiler explorer it even figures out to use vector instructions without being told to, which is neat.

I could avoid using custom vector and matrix types, but that would make everything harder to port to other platforms.
I'm really happy that it's finally (almost) possible to write genuinely portable programs with Embedded Swift, but if I depended on any libraries it wouldn't be the case for a while.

I will face similar issues because I am very much inspired by Minecraft and Dwarf Fortress, at the moment it's literally just very unfinished Minecraft.

I started with an ascii roguelike, then I added 2d graphics, and then 3d, and before I knew it I was writing Minecraft, which seems to be what all my attempts at making games lead to for some reason.

Nobody1707 · May 21, 2024, 10:03pm

Yeah, I feel like all languages that rely as much on abstractions being optimized away as C++, Rust, and Swift do should have an -Odebug mode, where it still optimizes but with an eye towards being easy to debug.

-Onone should really only be used as a baseline to sanity check the results of optimization against. Which shouldn't happen too often unless you're tracking down a compiler bug.

stackotter · May 22, 2024, 3:18am

Hmm yeah, looks like the only difference is really just generic specialisation. I was hoping there'd be some unnecessary copies that didn't get optimised away in debug mode but seems like that's not the case.

Optimisations to try

It's not ideal that debug mode is so much slower, but these optimisations could help the situation a bit;

Precompute your average vertex positions for each triangle when generating the meshes (won't help much on the first sort though).

Implement scalar division for vectors so that you can just do (tri.0 + tri.1 + tri.2) / 3 (should be slightly faster, but no clue whether it'd actually be measurable or not.

Make a dedicated magnitude or length computed property for your vector types implemented as (self ** self).sqrt (to avoid going through sequence generics). For anyone else reading this, note that ** is the codebase's custom dot product operator.

Also, I think triDistance is incorrect, it returns the square root of the Manhattan distance (unless it's some odd custom reduce) instead of the Euclidean distance; i.e. it adds the vector components and takes the square root, whereas it should be squaring the vector components first.

Technically you can also sort by the square of the euclidean distance to save a square root operation, but I've got a feeling that the extra square root is the least of your issues that'd probably be more on the order of a hundred milliseconds optimisation if that.

As always, these speculations need to be backed up by measurements, I have no clue if any of these will make any difference at all.

wadetregaskis · May 22, 2024, 3:40am

You might be able to improve things a little with the @_optimize(…) annotation. Either by explicitly turning on optimisations for performance-critical code even in debug builds, or by turning off optimisations for non-performance-critical code in release builds (in order to accelerate compilation).

Practically, however, it hinges on whether you have a relatively small number of functions that are performance problems (in either the runtime sense or the compilation time sense).

Another option is to use binary packages, to split out code that you don't need to modify often.

Lua · May 22, 2024, 11:35am

I do try to avoid copying arrays as much as possible

Oh, that explains why everything appeared so odd without the depth buffer
I was copying this function from Zig and probably forgot to square it when I had to stop for a moment to implement reduce.

At the moment it's so early that most of the code will change and not enough is happening to really profile, but I will keep these in mind for the future

I was only ever able to turn optimizations off, @_optimize(speed) never improves anything for me. Maybe it does something but it's usually not enough when I'm experiencing performance issues (which is usually due to generics like now)

Now that I don't add invisible faces to the mesh it's at least not as immediately bad, but it's still enough to freeze the application for a moment every time I move to a different block and vertices have to be sorted, which will definitely get worse with more complicated terrain than a flat plane