[Pitch] Improved Compiler Support for Large Homogenous Tuples

My thought is that we would not want to expose such functionality outside of the stdlib if possible. I am not against it in principle. But if there is a problem with it, I don't think it is necessary.

So my thoughts here is that we really have two possibilities here, remembering that we know if a tuple type was originally a homogenous tuple:

  1. We could just decide to pick some N that is relatively large and just say
    that we are going to break ABI there. The assumption behind the decision
    would be that at a certain point (as mentioned by Tony here:
    [Pitch] Improved Compiler Support for Large Homogenous Tuples - #40 by allevato)
    the compile time becomes so large and the runtime so poor that no one really
    would use such a type. That being said, just saying that no one uses such a
    large tuple is (IMO) not caring about the experience of the developer who
    runs into this. With that in mind, I think we would need mitigations in place
    to do this and also would need to develop some tooling/apply it to the source
    compatibility suite and other user software to get an idea of what N would be
    appropriate. I'll lay out my thoughts on the mitigations below.

  2. We could change the way we lowered homogenous tuples such that if we are
    calling a function with a homogenous tuple argument, we use the more
    efficient ABI. We also probably could for new enough deployment target maybe
    have a way for protocols to opt into it. We would need thunks to allow for
    backwards deployment and I am not sure if core protocols could use the new
    ABI. Instead I think the only thing we could hope for is that if we can
    devirtualize, we might be able to inline away the different convention by
    marking the thunks as transparent. That being said, if we were calling into
    generic code, we would still hit this. That being said, this avoids the ABI
    break but is unfortunate in its complexity and limitations.

Implementation/Mitigations for Option 1 ABI Break

In terms of the ABI break, my thought is that we can do it a reasonable way by:

  • Introducing a flag that would control whether or not we use the new or old
    tuple ABI. This flag would be serialized into the swift interface files of all
    stabilized swift libraries. The lack of such an option specified in the
    interface file will signal to the compiler that a library was compiled with
    the old tuple ABI. In the case, assuming that the current compilation does
    have the flag set, we could:

    • Emit a hard error and tell the user that the library needs to be recompiled.
    • Infer that we should use the old ABI instead of the new ABI for the current
      compilation. In this case, we would emit a warning that we are doing this
      stating that in a subsequent swift version this will be a hard error. If we
      find dependencies with a mix of old/new ABIs we would just error.
  • If the new ABI flag is passed in (or given a new enough deployment target), we
    would still allow for specific functions to be marked with an attribute that
    forces the function to use the old tuple ABI for compatibility reasons to help
    upgrade code.

  • In order to not force the optimizer to support multiple representations, we
    would move the implementation of the destructuring to a late IRGen pass or to
    IRGen and would have the flag control whether or not the tuple destructuring
    occurs at that point.

I think it might be interesting to consider allowing that since it would be possible to type-check based on the structure of the passed tuples, it would be great if that was called out in the proposal explicitly.

@xedin my question is if we flip it around... are there any downsides to doing it. I do think it could be useful. I just don't want to get any poo on my shoe = p.

If it won't be possible to declare a custom conformance to that protocol then I don't really see any problems, it might be useful to be able to specialize overloading behavior to homogeneous tuples with custom conformance requirements for stdlib but I'm no expert.

On another note, in terms of syntax I am thinking of choosing for arguments sake the (5 * Type). I am still open to further changes, but I want something to use in the 2nd version of the proposal and it seems based off feedback that people are open to that syntax.

1 Like

Just small thought on syntax.

Will it be too hard for parser to read C-like syntax (Int[5]) instead of (5 x Int)?
I would agree that x looks a bit odd for Swift.
btw, C-like syntax would also be good for multidimensional arrays.

1 Like

I wonder if this syntax is possible: (5 Type)

It reads a bit odd because Type is not pluralized, but I think it reads better than a multiplication operator.

I think it is important to have something visual that makes it very clear that the user is reading something with different semantics. That being said, given the amount of options, my gut tells me that I should just pick a syntax for the purpose of writing the proposal and then show the options to the core team and let the core team pick. The functionality here/how to handle the ABI break is what I think is the most important thing semantically here.

12 Likes

So I went and gathered some more data on the scalability problems with the old tuple ABI using scale_test and gyb. My test was this file and I used a near master compiler (that full disclosure did have asserts enabled):

struct Foo {                                                                                                                  
  var value: (                                                                                                                
% for i in range(0, N*100):                                                                                                   
  Int,                                                                                                                        
% end                                                                                                                          
  Int                                                                                                                         
  ) = (                                                                                                                       
% for i in range(0, N*100):                                                                                                   
   0,                                                                                                                         
% end                                                                                                                          
   0)                                                                                                                         
}                                                                                                                             
                                                                                                                              
@inline(never)                                                                                                                
public func trampoline<T>(_ t: T) -> T { t }                                                                                  
                                                                                                                              
func main() {                                                                                                                 
    let f = Foo()                                                                                                             
    trampoline(f.value)                                                                                                       
}                                                                                                                             
                                                                                                                              
main() 

and basically I was able to use the tool to show that as I increased the tuple size (and thus effected trampoline)

  1. -Onone,-O wall time is quadratic [O(n^2)].
  2. -Onone,-O type checker wall time is also quadratic.
  3. -Onone,-O code size is linear [O(n)]. I would not expect increasing a single argument's size by N to be linear (if we passed by address, it would be basically constant).

Also you may be interested that just this one file in terms of compile time at -Onone when N is 40 (and thus we have an array of ~4000 elements) my machine (a 2019 Mac book Pro) takes ~21s, which is way too long for an example this simple [it should be instant].

7 Likes

So I just went and gathered some information using the current Xcode that swift compiles against. I used swift-ast-tool with changes+some scripts to gather this data by parsing swift interface files. The specific changes to my swift repo is in this branch: GitHub - gottesmm/swift at ast-script-for-tuples.

The overall data is:

3839 Found tuple with eltcount: 2
 289 Found tuple with eltcount: 3
 156 Found tuple with eltcount: 4
 190 Found tuple with eltcount: 5
 188 Found tuple with eltcount: 6
  46 Found tuple with eltcount: 14
  52 Found tuple with eltcount: 16
  52 Found tuple with eltcount: 32

If I just take the max/swift interface file, I get the following results:

 253 Max Seen Tuple Size: 2
  11 Max Seen Tuple Size: 3
  39 Max Seen Tuple Size: 6
  26 Max Seen Tuple Size: 32

That suggests that everything with tuple size > 6 are in the same dylibs. I looked at this and all of the instances that I am seeing are in Foundation where they are using as inline buffers. I validated using the tbd files in that Xcode that we are exposing getters/setters to these as public symbols.

An example of a type with this problem is Foundation.Data.InlineData.bytes

    '_$s10Foundation4DataV06InlineB0V5bytess5UInt8V_A13HtvM', 
    '_$s10Foundation4DataV06InlineB0V5bytess5UInt8V_A13Htvg', 
    '_$s10Foundation4DataV06InlineB0V5bytess5UInt8V_A13HtvpMV', 
    '_$s10Foundation4DataV06InlineB0V5bytess5UInt8V_A13Htvs', 
==> Demangled
    'Foundation.Data.InlineData.bytes.modify : (Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8)', 
    'Foundation.Data.InlineData.bytes.getter : (Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8)', 
    'property descriptor for Foundation.Data.InlineData.bytes : (Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8)', 
    'Foundation.Data.InlineData.bytes.setter : (Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8)', 

In source one can see this in the Foundation interface file in the following form:

  @usableFromInline
  @frozen internal struct InlineData {
    @usableFromInline
    internal typealias Buffer = (Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8, Swift.UInt8)
    @usableFromInline
    internal var bytes: Foundation.Data.InlineData.Buffer // <==== THIS
    ...
  }

This suggests that at least we can fix the ABI with a size of 32 without issue but we may want to consider also adding a new attribute that says use the old ABI and apply it to Foundation and then update the ABI for a smaller N.

3 Likes

This syntax feels good to me.
So that (Int, 1) equivalent to (Int) equivalent to Int

The following bugs may be related (even if they are about large homogeneous arrays):
https://bugs.swift.org/browse/SR-7702
https://bugs.swift.org/browse/SR-7703
https://bugs.swift.org/browse/SR-9223
https://bugs.swift.org/browse/SR-9235
https://bugs.swift.org/browse/SR-9291

7703 has a Makefile that compares compile times of Swift, C++ (GCC and Clang) and Python.
9235 has a diagram of quadratic complexity: Swift Compile Performance Bugs - Google Sheets.
Changing 7703 from array to tuple makes no difference (14 seconds for 10.000 elements).

A use case is low-discrepancy sampling: pbrt-v4/sobolmatrices.cpp at master · mmp/pbrt-v4 · GitHub

1 Like