Consistency of the ObjectOutliner SILOptimizer pass

allevato · November 29, 2021, 5:17pm

I've moved this discussion to a more appropriate subforum from the original thread ([Pitch] Compile-Time Constant Values). I'm hoping that someone who works heavily on SILOptimizer like @Erik_Eckstein might have some more insight here.

I've been looking into taking advantage of swift_allocStaticObject -based values for other projects but if they're sensitive to small perturbations in the input like this, then that gets a lot trickier. I guess to frame the findings below, I have a couple questions:

Should we be able to rely on the optimizer to be able to outline certain forms of global/static data without any undocumented or hard-to-understand limitations around the size of that data? Should hitting such a limitation be considered a bug?
Could this outlining be made to work in debug builds as well? Even though performance isn't as critical in those cases, it would also be great if we could avoid penalizing debug builds in situations where we want to use static data.

If we could solve these two questions (or at least the first one), it would be a great way to handle simple static data in Swift without having to introduce new keywords or attributes into the language.

(My initial reply to @Karl's comment about the cost of initializing effectively static data at runtime)

There definitely are situations where the compiler can outline complex values directly into the data segment of the binary, but I don't know what the limitations are. For example, this example (godbolt) with a simple Int and String struct gets converted to a data blob:

output.staticArray : [output.Foo]:
        .zero   8

mainTv_:
        .zero   8
        .zero   16
        .quad   6
        .quad   12
        .quad   10
        .quad   7234932
        .quad   -2089670227099910144
        .quad   20
        .quad   133540975310708
        .quad   -1873497444986126336
        .quad   30
        .quad   133541042677876
        .quad   -1873497444986126336
        .quad   100
        .quad   7236850741311532655
        .quad   -1513209474789907086
        .quad   1000
        .quad   8462097072821464687
        .quad   -1441151879073603213
        .quad   100000
        .quad   -3458764513820540908
        .quad   .L__unnamed_1+9223372036854775776

.L__unnamed_1:
        .asciz  "one hundred thousand"

...where the code to initialize it consists only of a type metadata lookup and then a call to swift_initStaticObject, which appears to be fast—it just populates the runtime metadata pointer at the beginning of the inlined data.

One cool thing about this is that I was able to use Strings instead of StaticStrings and it still worked—and in fact, the strings that could fit into the small string representation used that instead of using a pointer to separate character data.

I'm not sure what it is about OptionSets that prevents this from working, though. I think this happens as part of the ObjectOutliner SILOptimizer pass in the compiler, so that would be the place to improve.

The other drawback about this is that since it's an optimization pass, unoptimized builds will still get the slow code path that initializes everything at runtime. I wonder if this transformation could happen as a mandatory pass, so that debug builds could still rely on static data.

(A few days later)

I poked at this a little bit more and the outcome was interesting (and beyond my understanding of how the optimizer works).

It turns out OptionSet s by themselves aren't a problem; some of them are able to outline into static objects fine. Your WebURL case does if we stop the array after element 0x67 . But once we add element 0x68 , or anything after that, it stops outlining.

I looked more closely at the generated post-opt SIL and it looks like in the 0x00-0x67 case, the ObjectOutliner pass outlines the whole array into a single SIL value mainTv_ , which is what we want.

In the 0x00-0x68 case, it looks like the individual arrays inside the larger array (the option set unions) are outlined separately ( mainTv_...mainTv7_ ), and the overall array is never outlined.

Maybe the inner array outlining is an intermediate step to outlining the whole thing, and the 0x68 th element pushes the number of basic blocks or instructions past some limit that's coded into an optimizer pass that makes it claim it's too complex to analyze, causing it to break down after that?

Karl · November 30, 2021, 6:14pm

Ping @Erik_Eckstein (who was pinged in the original before it was moved). I'd love to know if anybody has any ideas about this.

Thanks so much for investigating this. The findings are really fascinating. It's also interesting to see how much better the compiler has done with each release; there's a clear improvement from 5.3 -> 5.4 -> 5.5 -> nightly.

Karl · April 14, 2022, 9:13pm

It would still be really nice to get some insight in to this (and some improvements, if possible).

It is relatively rare these days that you find something which is literally impossible to do in pure Swift -- which if you accepted some amount of pain as a library developer, you could not overcome somehow for the library's clients. But this is one. For some kinds of tables, they are so big that runtime initialisation is unacceptable, and pure Swift libraries can't do anything about it.

When I asked @Alejandro, he said the reason the standard library's Unicode data had to be written in C was that Swift can't handle the static data tables. I don't know if there were any other considerations, but it's true, Swift can't handle these tables (as shown by @allevato, it's not predictable), so it isn't even an option.

For WebURL, I'm also looking at including some Unicode data tables. Again, I'm hitting the problem that runtime initialisation would be unacceptable, so for the first time I am actually finding no road forward with pure Swift, and will have to accept some C. It's sad because clearly the compiler can do this to some extent, we just need some control over the heuristic which makes it bail out.

Or perhaps the heuristic needs some adjustment - I mean, shouldn't the optimiser get more excited by the idea of turning a bigger array in to static data? Why is it bailing out if the array is too large and every value it has seen can be constant-folded?

Karl · April 21, 2022, 8:35pm

(This was originally an update of the previous post, but I feel that editing a 7-day old post is a bit unfair and I know that some people don't like it, so now it's separate):

Or... perhaps I should take that back. I just tried to generate a very basic IDNA mapping table in Swift:

internal enum IDNAMappingStatus {
  case valid
  case ignored
  case mapped
  case deviation
  case disallowed
  case disallowed_STD3_valid
  case disallowed_STD3_mapped
}

internal let idna_mapping_data: [(codePoints: ClosedRange<UInt32>, status: IDNAMappingStatus, mapping: [UInt32]?)] = [
  (0x0...0x2C, IDNAMappingStatus.disallowed_STD3_valid, nil),
  (0x2D...0x2E, IDNAMappingStatus.valid, nil),
  (0x2F...0x2F, IDNAMappingStatus.disallowed_STD3_valid, nil),
  (0x30...0x39, IDNAMappingStatus.valid, nil),
  (0x3A...0x40, IDNAMappingStatus.disallowed_STD3_valid, nil),
  (0x41...0x41, IDNAMappingStatus.mapped, [0x61]),
  (0x42...0x42, IDNAMappingStatus.mapped, [0x62]),
  (0x43...0x43, IDNAMappingStatus.mapped, [0x63]),
  (0x44...0x44, IDNAMappingStatus.mapped, [0x64]),
<snip>
  (0xB8...0xB8, IDNAMappingStatus.disallowed_STD3_mapped, [0x20, 0x327]),
  (0xB9...0xB9, IDNAMappingStatus.mapped, [0x31]),
  (0xBA...0xBA, IDNAMappingStatus.mapped, [0x6F]),
  (0xBB...0xBB, IDNAMappingStatus.valid, nil),
  (0xBC...0xBC, IDNAMappingStatus.mapped, [0x31, 0x2044, 0x34]),
  (0xBD...0xBD, IDNAMappingStatus.mapped, [0x31, 0x2044, 0x32]),
  (0xBE...0xBE, IDNAMappingStatus.mapped, [0x33, 0x2044, 0x34]),
  (0xBF...0xBF, IDNAMappingStatus.valid, nil),
  (0xC0...0xC0, IDNAMappingStatus.mapped, [0xE0]),
  (0xC1...0xC1, IDNAMappingStatus.mapped, [0xE1]),
  (0xC2...0xC2, IDNAMappingStatus.mapped, [0xE2]),
<snip>
]

Obviously it could be optimised a lot - this is just straight from the data files, with no considerations about faster representations for contiguous ranges or splitting off the "mapping" (replacement) scalars.

But even so, it's enough to make me doubt it'll be realistic to ship any time soon. Even if I replace the Array with a scalar, it takes >75GB memory for a debug build of this thing (I also tried a nightly; same). At that point, questions about static initialisation are kind of academic. There's no way this can ship.

Removing the table and building again, I see a couple of instances of swift-frontend spin up, each of which take 60-100MB of memory and are gone in a flash as the project is built. With the table? It eats the universe.

There are 8908 entries in the table, which is a lot... but also not that much. I don't really think that's a proportionate level of memory use, to be honest.

I think we just need to admit that Swift can't do this kind of thing, and since C can do it, and we can interop with C, there's probably not a huge appetite to fix it (at least, the reaction to this thread doesn't suggest there is). It means that libraries who want to be pure Swift and showcase what the language can do are left in a bit of an embarrassing spot though. It's hard to claim that Swift is suitable for systems programming when we can't handle this kind of task.

allevato · April 21, 2022, 8:41pm

FWIW, this is exactly the situation that SwiftProtobuf is in. There are a lot of performance improvements that I think we could make by switching to a table-based approach. But we do code generation, and having to generate a mix of both Swift as well as C code for the low-level bits would be an absolute non-starter.

If there's no interest in addressing this right now and the answer is just "wait for compile-time-evaluated values", then I think that would be a shame because I think there is still a lot of room for improvements that doesn't involve waiting for whole features to be proposed, implemented, and land in Xcode—C/C++ were doing this decades before constexpr was a thing.