Confusion with tuples using named elements - and possible compiler bug?

I encountered some strange behaviour of tuples with specified element names in combination with the reduce operator.

The following code (a simplified version of a real world section of my code) causes a runtime crash, no compiler errors or warnings given.

// Define some data using tuples with element names

let myData: [[(id: Int, data: String)]] = [[(100,"Apple"),(200,"Orange")], [(300,"Lemon")]]

//I use the reduce function to flatten the array and misspelled one element name.

//This will compile just fine, but causes a runtime crash

let myFlatData1: [(id: Int, daata: String)] = myData.reduce() {$0 + $1}

//The same code with type annotations to make it clear what happens.

// The compiler should detect that “item” has the wrong type - i.e. (id: Int, data: String)

let myFlatData2: [(id: Int, daata: String)] = myData.reduce((id: Int, daata: String)) {

(accumulator: [(id: Int, daata: String)], item: [(id: Int, daata: String)]) in

accumulator + item

//I could of course used flatMap (which I probably should have), but I didn’t. flatMap behaves as expected, the following code will not compile.

let myFlatData3: [(id: Int, daata: String)] = myData.flatMap{$0}

If I’m not completely mistaken, I think it is a pretty serious bug since a simple misspelling could lead to unexpected crashes. I’m looking forward to hearing any thoughts on this from the community.

Kind regards,

Anders

rb_email_small_sv.png

[Fixed some formatting, sorry...

I encountered some strange behaviour of tuples with specified element names in combination with the reduce operator.

The following code (a simplified version of a real world section of my code) causes a runtime crash, no compiler errors or warnings given.

Define some data using tuples with element names

let myData: [[(id: Int, data: String)]] = [[(100,"Apple"),(200,"Orange")], [(300,"Lemon")]]

I use the reduce function to flatten the array and misspelled one element name.
This will compile just fine, but causes a runtime crash

let myFlatData1: [(id: Int, daata: String)] = myData.reduce([]) {$0 + $1}

The same code with type annotations to make it clear what happens.
The compiler should detect that “item” has the wrong type - i.e. (id: Int, data: String)

let myFlatData2: [(id: Int, daata: String)] = myData.reduce([(id: Int, daata: String)]()) {
    (accumulator: [(id: Int, daata: String)], item: [(id: Int, daata: String)]) in
    accumulator + item
}

I could of course used flatMap (which I probably should have), but I didn’t. flatMap behaves as expected, the following code will not compile.

let myFlatData3: [(id: Int, daata: String)] = myData.flatMap{$0}

If I’m not completely mistaken, I think it is a pretty serious bug since a simple misspelling could lead to unexpected crashes. I’m looking forward to hearing any thoughts on this from the community.

Hi!

(Note that you can edit your original post (to fix the formatting) instead of reposting it as a comment.)

I agree that it looks like a compiler bug. The error is missed or detected during compile time depending on the exact formulation / shorthands used.

Here's a somewhat reduced demonstration program:

let myData: [[(id: Int, data: String)]] = [
    [(100,"Apple"),(200,"Orange")],
    [(300,"Lemon")]
]

let myFlatData1: [(id: Int, daata: String)] = myData.reduce([]) {
    // let _ = ()  // <--- Uncommenting this line helps compiler
    return $0 + $1 //      detect error at compile time here ---.
    //          ^-----------------------------------------------'
}

The error (which is detected either at compile time or runtime) is this:

Cannot convert value of type '[(id: Int, data: String)]' to expected argument type 'Array<(id: Int, daata: String)>'

(that is, data was misspelled as daata, which caused mismatched array types.)

1 Like

Thanks @Jens for the quick response. It seems even stranger that an arbitrary line of code before the return statement helps the compiler to detect the error, I tried

    print("")
    return $0 + $1 

and

    enum A {case a}
    return $0 + $1 

and both triggers the compiler to detect the error

Yes, I used let _ = () just as an example (it was the simplest and most meaningless statement I could come up with).


Here's a further reduced demonstration program:

func foo<T, R>(_ a: T, _ fn: (T) -> R) -> R { return fn(a) }

let a: [(id: Int,  data: String)] = [(123, "abc")]
let b: [(id: Int, daata: String)] = foo(a) {
    // _ = () // <-- Uncomment this line to detect error on
    return $0 // <-- this line at compile- instead of run time.
}

So, for Array, this is an error that shows either at runtime or compile time (depending on the above workaround).

But for other simple generic types like G here:

struct G<T> { var value: T }
func foo<T, R>(_ a: T, _ fn: (T) -> R) -> R { return fn(a) }

let a: G<(id: Int,  data: String)> = G(value: (123, "abc"))
let b: G<(id: Int, daata: String)> = foo(a) {
    return $0 // <-- Error here at compile time without the above workaround.
}

the error will always be detected at compile time, ie even without the meaningless-statement-workaround.


But (again) for Optional, though the error will be detected at compile time with the meaningless-statement-workaround, it will not be detected at runtime. So the type conversion from
Optional<(id: Int, data: String)> to
Optional<(id: Int, daata: String)>
will just work:

func foo<T, R>(_ a: T, _ fn: (T) -> R) -> R { return fn(a) }

let a: (id: Int,  data: String)? = (123, "abc")
let b: (id: Int, daata: String)? = foo(a) {
    // _ = () // <-- Uncomment for compile time error on the next line, but note that it will compile and run as long as this line is commented away.
    return $0 // <-- No runtime error here when it's `Optional<Tuple>` instead of `Array<Tuple>`.
}
print(b ?? "") // Prints `(id: 123, daata: "abc")`

I guess it all boils down to type inference resulting in (what at least we perceive as) inconsistent behavior. That it sometimes succeeds is probably because it then manages to infer the types so that something along the lines of the following example happens (which is valid and compiling code):

let a: (id: Int,  data: String) = (123, "abc")
let b: (id: Int, daata: String) = { (x: (Int, String)) in x }(a)

But that can't be the whole explanation, since it would mean something like this for your original example:

let myData: [[(id: Int, data: String)]] = [
    [(100,"Apple"),(200,"Orange")],
    [(300,"Lemon")]
]
let myFlatData1: [(id: Int, daata: String)] = myData.reduce([]) {
    (a: [(Int, String)], b: [(Int, String)]) in return a + b
}
print(myFlatData1) // Prints `[(id: 100, daata: "Apple"), (id: 200, daata: "Orange"), (id: 300, daata: "Lemon")]`

which compiles and runs without a runtime error ...

Anyway, I filed SR-12261.

Perhaps @Slava_Pestov can explain what's going on?

1 Like

Thank you for the bug report! I've reduced the problem further:

let tup: (id: Int, data: String) = (1, "hi")
let tup2: (id: Int, daata: String) = tup

let fn: ([(id: Int, data: String)]) -> () = { _ = $0 }
let fn2: ([(id: Int, daata: String)]) -> () = fn

We correctly reject the initializer expression for tup2. However fn2 is accepted. This should not be the case. It's a bug in the expression checker's matchType() or related code.

7 Likes

I wrote up an explanation in the bug report, but I believe this behaves "correctly", even if it's deeply unintuitive. We should revisit these inference rules.

Quoting myself:

Definitely a quirk of the closure inference rules. Let's take program B:

let b: [(id: Int, daata: String)] = foo(a) {
    return $0 // <-- this line at compile- instead of run time.
}

The single-expression closure causes the body to be treated as one big expression. That expression's type is bound to the function type (T) -> R. While the labels do not match, there's an applicable conversion: the argument labels don't matter for closures. So we insert a function conversion node and everything goes off without a hitch.

Now, for the multi-statement closure.

let b: [(id: Int, daata: String)] = foo(a) {
    _ = () // <-- Uncomment this line to detect error on
    return $0 // <-- this line at compile- instead of run time.
}

We cannot treat the body as before, so we must instead make do with what we do have: T is contextually bound to (id: Int, data: String)? via the argument a, and the return type is contextually bound via the user-provided type annotation (id: Int, daata: String)?. As there is no valid conversion rule between tuples with different sets of labels, the return type constraint fails.

The takeaways: Despite being unintuitive, this behaves correctly. Now, does it behave intuitively? Absolutely not. The inference rules for multi-statement closures really ought to be revisited to make these cases consistent one way or the other.

5 Likes

But, as @Slava_Pestov said, surely the following program still demonstrates a bug:

// This program compiles (even though it shouldn't):
let fn: ([(id: Int, data: String)]) -> () = { _ = $0 }
let fn2: ([(id: Int, daata: String)]) -> () = fn
fn2([(123, "hello")]) // <-- and crashes at runtime here

(fn2 is impossible to call with any nonempty array (whatever the labels) without crashing.)

and program B demonstrates the same bug
func foo<T, R>(_ a: T, _ fn: (T) -> R) -> R { return fn(a) }
let a: [(id: Int,  data: String)] = [(123, "abc")]
let b: [(id: Int, daata: String)] = foo(a) {
    return $0 // <-- crashes at runtime
}

no?

1 Like

I agree with @Jens and @Slava_Pestov, this is definitely a bug. The compiler lets something through that is known to cause a runtime crash. And I've encountered this in a real life project by simply misspelling one of the tuple item labels.

I've prepared a tentative fix for this issue. It is source breaking though -- because we now disallow some weird conversions that we used to allow. The problem has been there since the beginning, so in case there's fallout we might have to downgrade this to a warning. I hope not, though.

The new rule is that when solving a subtype constraint, the tuple labels at a given index must either match, or one of the two labels must be empty. This matches the conversion constraint, which is used in more places (and I suspect that is the reason nobody reported this issue sooner).

Here is the fix: Sema: Stricter 'Subtype' constraint on tuple types by slavapestov · Pull Request #30191 · apple/swift · GitHub

6 Likes

It looks like both our benchmark suite and source compat suite exhibit examples of the old behavior, which probably means other cases exist in the field. So I'm going to hold off for now, and either downgrade this to a warning or if we ever do a new language mode, hang it off of that.