Integer generic parameters

Karl · August 29, 2024, 11:24pm

I don't see this proposal as "special-casing" Int. It introduces the concept of types parameterised by values, and starts with Int (have to start somewhere, right?). But there's really nothing in this proposal precluding any other kind of value.

It doesn't even expose the numerical nature of Int to the type system - it doesn't include constraints like where N + 1 == M or where N < M or anything like that. They're kind of opaque values at the type level.

Joe_Groff · August 29, 2024, 11:31pm

One issue I can see with value generic parameters in their potential full generality vs special-case array syntax is the need to deal with the syntactic ambiguities of angle brackets and </> operators in expressions. Most special-case syntax ideas for fixed-size arrays put the dimension in a less syntactically precarious position that could handle embedding an arbitrary expression as a type parameter, whereas terms in angle brackets are going to have trouble.

I suppose we could put value generic parameters in their own separate brackets, so you'd have something like struct Vector[N: Int]<T: ~Copyable> instead of struct Vector<let N: Int, T: ~Copyable>, and instantiations could look like Vector[n < 0 ? -n : n]<Double> instead of the hard-to-parse Vector<n < 0 ? -n : n, Double>.

j-f1 · August 30, 2024, 3:24am

This wouldn’t work for types with only value generic parameters and no type generic parameters, though. Vector[n < 0 ? -n : n].self is already a valid expression (accessing a static subscript, and then the always-present .self member of the return value).

tera · August 30, 2024, 6:36am

Could you please remind us why is that a problem? (Could well be, I just can't remember the real issue with it).

This IMHO could be one of the practical approaches – limit homogenous tuples to those whose elements' sizes == strides (if the above tail padding is a real problem). Alternatively make (T x 2) to have tail padding even if (T, T) doesn't have it, making the two slightly different and non-interchangeable types.

Sure, but at the same those are not truly fixed sized collections... and the requirement of having all elements initialised don't look to cause practical issues.

Joe_Groff · August 30, 2024, 4:27pm

It's not ideal, but we do also have this syntactic issue with pretty much every "sugar" form of type syntax: [Foo] is also a potential array literal, [Foo: Bar] a dictionary literal, T?.AssocType an optional chain, and so on. As long as the overall structure of the parse tree is the same between the type and expression parse, we can do name lookup to disambiguate. What we absolutely don't want to have to do is rely on name lookup during parsing like C++ does. (We ideally don't want to have to use name lookup after either, since that also has some problems, but it is at least a bridge we've already crossed.) An angle-bracket-delimited generic parameter list has a totally different parse tree from an expression sequence involving binary operators, which is why we have the parse-time heuristic to disambiguate < early.

dmt · August 31, 2024, 4:59am

Not relevant. Already discussed above.

That's a great feature and a long-awaited extension to the generics system. Happy to see another feature from the generics manifesto gets an implementation.

It's probably beyond the scope of the proposal, but do you have any picture of how the actual backing storage here will be defined?
Later in the Future Direction you mention this syntax

Is this just a potential extension for parameter packs or do you suggest using it in Vector?
Because if it's the latter, I'm a bit concerned about this syntax.

Do we want to express a homogenous indexed collection as a tuple defined via parameter packs? Tuples are heterogeneous by nature and not indexed. Also, we'd have to guarantee that such tuples have the same memory layout as a proper array would.
I'm worried about this asterisk sign. It could be a source of ambiguity because it now means both "parameter pack has size lhs from *" and "multiply". And both of these can be in a single expression - repeat each M * N * T.

Maybe for starters, we're better off with some Builtin defined C-array-like type exposed to the type system aka Builtin.StaticArray<T, N>.
Or make @_rawLayout work with generic value parameters (and values evaluatable from generic parameters).

@_rawLayout(likeArrayOf: T, count: N)
struct Vector<T, let N: Int> { ... }

jrose · August 31, 2024, 10:46pm

Seconding this, it’s been talked through before both here and in Rust (which is currently C-like, but has considered what it would take to go Swift-like with separate size and stride). As far as I know, there’s actually nothing that requires the size of [T * N] to be MemoryLayout<T>.stride * N, rather than MemoryLayout<T>.stride * (N-1) + MemoryLayout<T>.size (yes, I know that’s wrong for 0, don’t worry about it). Anything that reads or writes the entire value in a typed way will use the correct size, whatever we decide that is; anything that only goes element-by-element will never read any of the tail padding at all. The only problem comes if someone reads or writes the “full” N * stride when only the unpadded length was allocated. And that can certainly happen! But it implies someone is manually calculating sizes that they hopefully don’t need to anymore. (Unfortunately, if they’re doing it at all, it’s already in unsafe-pointer land, so there’s not an obvious way for the compiler to catch it.)

There is a performance cost, just as there is with any size vs stride difference: care must be taken to preserve the elements in the tail padding, which may mean more instructions to avoid touching extra memory that doesn’t belong to you. But that isn’t really new in Swift, though it should probably go into a performance guide for fixed-sized arrays.

dmt · September 1, 2024, 12:37am

Slava_Pestov:

The answer might be slightly disappointing:

@_someMagicAttributeYouCanOnlyUseInTheStdlib
struct Vector<let N: Int, T> {}

or

struct Vector<let N: Int, T> {
  // Builtin is our magic module that can only be used in the stdlib
  var contents: Builtin.Vector<N, T>
}

It's like asking about the implementation of Int.+

It's more like Int128. There's a difference Int.+ can be marked with @backDeployed, but a type can't. And when a type is only available in a recent version of the stdlib, many people would want to reimplement it in their codebase to use on, for example, older versions of Apple OSes.
I think it's totally fine with an underscored attribute or a Builtin backing type, as long as the compiler doesn't prohibit their usage outside of the stdlib (or allow them behind a flag as it's implemented for the Builtin module).

stuchlej · September 1, 2024, 5:43pm

I was wondering, could this feature be used to improve handling of static C arrays by the Clang Importer? I know this would be source-breaking, but having to deal with hundreds-long-touples (for example like this data | Apple Developer Documentation) is a little bit silly (or unsafe ).

fclout · September 1, 2024, 6:53pm

The optics on this one are still not completely convincing to me. We got away without fixed-size arrays until Embedded Swift made them very hard to ignore. The people who will be using Embedded Swift the most, at least in the short term, have a very C background. Swift is already a much larger language than C, and now we're going to tell C developers that in order to use fixed-size arrays, they should understand that certain integers are special and go in angle brackets. Meanwhile, since the dawn of fixed-size arrays, C been able to abstract over them using pointer+length pairs; and we're suggesting that this is what it will actually compile to in Swift because generic arguments are not always known at compile-time anyways.

We have facilities in the language today that, at first glance, seem sufficient to abstract over fixed-size collections. We can make a fixed-size collection protocol where there is also a static count property, or (soon) we can use Span and not use generics at all. These are, of course, all things that we can encourage people to use over value generic parameters. But would that mean that we need value generic parameters only because we need a syntax to define the maximum capacity of fixed-size collections?

scanon · September 1, 2024, 7:12pm

Integer constant expressions--very roughly as close as C gets to an analogous concept--get like thirty paragraphs of discussion scattered throughout the standard. Pointer-length pairs are equally available in Swift already (and always have been). This is about providing a better abstraction than that, and it's not much more complex that similar mechanisms in C are, while providing much stronger guarantees.

fclout · September 1, 2024, 8:32pm

That seems like a very xkcd 2501 thing to say. There’s no shortage of C developers, we can just find a handful and ask them what they think of ICEs vs. value generic arguments.

scanon · September 1, 2024, 8:35pm

You'll have to find some who actually understand the ICE rules first.

(FWIW, I am a C developer, and I understand the ICE rules, and really the generics rules proposed here are simpler. They may eventually grow to be more complex than the ICE rules, but they aren't yet.)

Slava_Pestov · September 1, 2024, 8:38pm

Yeah, it would be awesome to finally get rid of that.

I don't have a good sense of how source breaking it would be in practice. Perhaps if fixed-size arrays (or even just fixed-size arrays imported from Clang) supported a transitional .N syntax where N is an integer literal to just mean [N] it would get us most of the way there.

We could also stage it in with a language mode, or import the same property twice as "foo" vs "foo_vector" or something, as @Joe_Groff suggested earlier.

fclout · September 1, 2024, 8:55pm

I’m not disputing the respective specification quality, I’m more saying that the fact so many people are able to use C ICEs without knowing the standard chapters on a first-name basis is a good indication that, for all the specification problems, the resulting implementations meet needs intuitively. People shoot themselves in the foot with VLAs, but there are diagnostics for that (and it’s perhaps more a VLA problem than an ICE problem as BitInt doesn’t have that problem), and ICEs are used successfully for global initialization, enum values, static asserts, etc.

Slava_Pestov · September 1, 2024, 8:59pm

Yeah, ultimately you only really need type-level integers of some kind to express structs and enums that contain fixed-size arrays as part of their inline storage. (Either that, or a magic hard-coded homogeneous tuple syntax that expands at some point, but that's just much messier for so many reasons. Inside a compiler, it's really nice to have a "tuple type" where you can operate on the fields in an O(n) fashion at compile-time. This is a problem for int foo[4096].)

If you can arrange it so that the fixed-size arrays are stored elsewhere and not directly inside your structs and enums, then Spans probably gets you most of the way there as far as 'generic algorithms' over fixed size arrays go.

One potential downside is there's no easy way to produce a new value of the same size with this idiom, whereas you can write a map that's generic over N and produces a new array value. Whether that's a common-enough requirement when working with fixed size arrays, I'm not entirely sure.

fclout · September 1, 2024, 9:10pm

What does it help when the storage is out-of-line? Is it because ownership rules might prevent you from working with the struct while the Span is live, or something else?

Nobody1707 · September 1, 2024, 9:20pm

If storage is out of line, you can use a ManagedBuffer or an UnsafeMutableBufferPointer to hold the underlying storage.

Joe_Groff · September 3, 2024, 3:26pm

With out-of-line storage, the type's own inline storage size doesn't need to be dependent on the storage capacity; a Span consisting of a start pointer and length is probably two words in size no matter how large the buffer it references is. For a type that stores N elements inline inside of itself, its size depends on the number of elements, and because Swift requires all values of a type to have the same basic layout, that means types for different sizes need to be distinguished.

As Slava (and the proposal) noted, it's theoretically possible we could introduce a ~FixedSize suppressible type property, and have it so that values or references to values need to be passed around with size information for that specific type. That might make some things cleaner, but I don't think it's ultimately simplifying. If you want to compose a ~FixedSize thing into something that is even less fixed-size (a Matrix<N, M> made out of Vector<N>s, for instance), you need some way to compose the inner and outer size information. Some APIs on fixed-size data structures really do want to take two values with the same size, or produce a return value with the same size as the inputs, but if the size information isn't directly part of the type system already, then you need some parallel mechanism to propagate size information through functions in addition to types (and it would be subject to many of the same design considerations as integer generic parameters). And even if the natural representation of a borrowing VariablySizedArray parameter is a start pointer and length like Span, I still think there would be reasons to want a Span type, since borrow/inout references are not rebindable (like T& in C++) whereas, as a first class type, Span can be rebound separate from the referenced data (like T*).

allevato · September 3, 2024, 3:33pm

I've encountered cases where fixed-length C arrays of a certain size would just cause compiler performance to collapse. We had something like this:

typedef struct {
  int array[2048][2048];
} S;

And either ClangImporter or the type checker just fell to its knees trying to work with a 2048-tuple of 2048-tuples.

I think wrapping the importer change in a language mode and upcoming feature flag would be the best approach here, so that we avoid generating the old broken declaration entirely but people can opt-in to using the better version sooner.