Inconsistent SE-0299 behavior if closure parameters are unnamed?

taylorswift · February 1, 2023, 10:23pm

i’ve been struggling to create a basic BSON encoding DSL in swift for several days now, and i just can’t shake the feeling that this is impossible and that swift is just really bad at DSLs.

here’s my problem:

i have a protocol BSONEncodable, which has some of the usual conformers:

protocol BSONEncodable
{
}

extension Int:BSONEncodable
{
}
extension String:BSONEncodable
{
}
extension Optional:BSONEncodable where Wrapped:BSONEncodable
{
}

i also have an encoding container, UniversalBSONDSL, which for the purposes of this example, just looks like:

struct UniversalBSONDSL
{
    init()
    {
    }

    init(with populate:(inout Self) throws -> ()) rethrows
    {
        self.init()
        try populate(&self)
    }
}

UniversalBSONDSL models a document, and documents can contain other documents, so UniversalBSONDSL is itself BSONEncodable.

extension UniversalBSONDSL:BSONEncodable
{
}

finally, UniversalBSONDSL vends a key-value pair-based encoding interface through some instance subscripts. here’s a simplified schematic:

extension UniversalBSONDSL
{
    subscript<First, Second>(key:String) -> (First, Second)?
        where First:BSONEncodable, Second:BSONEncodable
    {
        get { nil }
        set { fatalError() }
    }
    subscript<Encodable>(key:String) -> Encodable?
        where Encodable:BSONEncodable
    {
        get { nil }
        set { fatalError() }
    }
}

it can be used like this:

let _:UniversalBSONDSL = .init
{
    $0["$abs"] = 1
}
let _:UniversalBSONDSL = .init
{
    $0["$add"] = (1, "$field")
}

where this all falls apart is the recursive case. you see, aggregation expressions (e.g. $abs) can contain other expressions, so there needs to be an easy way to nest these expressions. for example, we want to be able to encode something like the following JSON:

{
    $add: [ 1, { $abs: 1 } ]
}

and this is where swift completely falls flat, because even with SE-0299, this use case just doesn’t work.

to start, SE-0299 doesn’t work with inits, so even if you move the definition of init(with:) to a protocol extension block, it still doesn’t compile:

extension BSONEncodable where Self == UniversalBSONDSL
{
    init(with populate:(inout Self) throws -> ()) rethrows
    {
        self.init()
        try populate(&self)
    }
}

let bson:UniversalBSONDSL = .init
{
    $0["$add"] = (0, .init
    {
        _ in
    })
}

encodable.swift:77:9: error: generic parameter 'Encodable' could not be inferred
        $0["$add"] = (0, .init
        ^
encodable.swift:39:5: note: in call to 'subscript(_:)'
    subscript<Encodable>(key:String) -> Encodable?
    ^
encodable.swift:77:22: error: cannot assign value of type '(Int, _)' to subscript of type 'Encodable'
        $0["$add"] = (0, .init
                     ^~~~~~~~~
encodable.swift:77:27: error: cannot infer contextual base in reference to member 'init'
        $0["$add"] = (0, .init

but SE-0299 does work with static methods, at least superficially, probably because it is implemented in terms of things that return Self (which apparently init is not one of). so this miraculously does compile:

extension BSONEncodable where Self == UniversalBSONDSL
{
    static
    func document(with populate:(inout Self) throws -> ()) rethrows -> Self
    {
        try .init(with: populate)
    }
}

let bson:UniversalBSONDSL = .init
{
    $0["$add"] = (1, .document
    {
        _ in
    })
}

but now this is where i run into a lot of weirdness, because the minute i try to actually do something with the closure parameter, it stops compiling:

let bson:UniversalBSONDSL = .init
{
    $0["$add"] = (1, .document
    {
        $0["$abs"] = 1
    })
}

encodable.swift:86:9: error: generic parameter 'Encodable' could not be inferred
        $0["$add"] = (1, .document
        ^
encodable.swift:39:5: note: in call to 'subscript(_:)'
    subscript<Encodable>(key:String) -> Encodable?
    ^
encodable.swift:86:22: error: cannot assign value of type '(Int, _)' to subscript of type 'Encodable'
        $0["$add"] = (1, .document
                     ^~~~~~~~~~~~~
encodable.swift:86:27: error: cannot infer contextual base in reference to member 'document'
        $0["$add"] = (1, .document
                         ~^~~~~~~~

the closure parameter needs a type annotation:

let bson:UniversalBSONDSL = .init
{
    $0["$add"] = (1, .document
    {
        (bson:inout UniversalBSONDSL) in

        bson["$abs"] = 1
    })
}

but this syntax is just awful. and it’s really strange that using the $0 parameter breaks type inference, because when i hover over the _-bound one in VSCode i can see that the compiler really did choose the correct overload.

ideally, i would be able to something like the following:

let bson:UniversalBSONDSL = .init
{
    // A
    $0["$abs"] = "$field"

    // B
    $0["$abs"] = .init
    {
        $0["abs"] = "$field"
    }

    // C
    $0["$add"] = (1, "$field")

    // D
    $0["$add"] = (1, .init
    {
        $0["$abs"] = 1
    })
}

A works. B doesn’t work out of the box, but can be made to work in a scalable fashion by vending init(with:) on Optional<UniversalBSONDSL>.

C works, but D doesn’t, and the usual workaround — vending a concretely-typed subscript overload — isn’t effective because aggregation operators can take up to four arguments, and the number of subscript overloads required is exponential in the number of concrete types that need to be special-cased.

in the absence of a consistently-behaving SE-0299, it seems the best we can do is either

    $0["$add"] = (1, UniversalBSONDSL.init
    {
        $0["$abs"] = 1
    })

or

    $0["$add"] = (1, .document
    {
        (bson:inout UniversalBSONDSL) in

        bson["$abs"] = 1
    })

and both of these syntaxes are awful when all we wanted to write was

{
    $add: [ 1, { $abs: 1 } ]
}

taylorswift · February 1, 2023, 11:07pm

as if this couldn’t get weirder, i have just discovered that i can make something similar to D compile, if only by using the closure parameter more than once:

$0["$add"] = (1, .document
{
    $0["x"] = 1
    $0["x"] = 1
})

but deleting either clause breaks the type inference. it doesn’t seem to be affected by presence of mutating get on the subscript.

i can also make a single-clause version compile by coercing the integer literal to an optional:

$0["$add"] = (1, .document
{
    $0["x"] = 1 as _?
})

jrose · February 1, 2023, 11:11pm

I feel like you could be using either literals or result builders for this? SwiftyJSON uses the former, and the original result builders proposal has an example of the latter for HTML (slightly different because the nodes have names, but not terribly).

EDIT: Sorry, you did ask a specific question about SE-0299 and this is ignoring that; it's mostly in response to "Swift is really bad at DSLs". Which, well, I don't think Swift is The Best at DSLs, but you aren't using the tools meant for DSLs at the moment.

taylorswift · February 1, 2023, 11:32pm

a long time ago, the API actually did use literals, but i moved away from that model because literals really don’t coexist peacefully with {Domain}Encodable. in particular, i did not like that every usage of that API ended up looking like:

let parameter1:Int32
let parameter2:Int
let parameter3:String
let parameter4:[Substring]

[
    "x": .int32(parameter1),
    "y": .int64(.init(parameter2)),
    "z": parameter3.map(AnyBSON<[UInt8]>.string(_:)),
    "w": .array(elements.map(\.anyBSONValue))
]

adding convenience API to get rid of some of the noise somewhere just requires a ton of boilerplate elsewhere, with the end result being that the BSON library just became a landfill of uncomposable convenience API.

{Domain}Encodable just made everything so much more readable and maintainable.

another reason is that literals cannot handle field elision, which the subscript-assignment model is really good at, because they have a natural mechanism for eliding fields where the value is nil or empty.

{
    $0["filter", elide: true] = filter
}

although i have not really discussed decoders in this topic, a third reason is that the corresponding decoding API uses subscripts very effectively (they are very good at generating diagnostics and handling things like missing fields vs explicit null, etc.), so it just made sense to use subscripts to encode as well.

a really long time ago, before the API used literals, it used result builders. but result builders aren’t great at expressing “key-value”-like things, they are really only useful for treelike things like HTML. because when you want to yield something from a result builder, it looks like

("$operator", UniversalBSONDSL.tuple
{
    "argument1"
    UniversalBSONDSL.document
    {
        ("a", y)
        ("b", z)
    }
})

so, yes, i have used the tools meant for DSLs, and they have so far not worked for my use case.

taylorswift · February 2, 2023, 1:58am

sometimes the solution is so obvious you can’t even see it.

subscripts were so useful for encoding because they speak Optional, and the way to get them working with generic tuple values is to just make the tuple elements themselves Optional, instead of making the whole tuple optional.

@inlinable public
subscript<Divisor, Remainder>(key:Mod) -> (by:Divisor?, is:Remainder?)
    where Divisor:BSONEncodable, Remainder:BSONEncodable
{
    get
    {
        (nil, nil)
    }
    set(value)
    {
        ...
    }
}

and this also works great for DSLs with variadic arguments:

extension MongoExpressionDSL.Operator
{
    @frozen public
    enum Variadic:String, Hashable, Sendable
    {
        case add = "$add"
    }
}

    subscript<T0, T1>(key:Variadic) -> (T0?, T1?)
        where   T0:MongoExpressionEncodable,
                T1:MongoExpressionEncodable

    subscript<T0, T1, T2>(key:Variadic) -> (T0?, T1?, T2?)
        where   T0:MongoExpressionEncodable,
                T1:MongoExpressionEncodable,
                T2:MongoExpressionEncodable

this is of course a bit off topic, since we still haven’t gotten to the bottom of why SE-0299 behaves so weirdly with subscript assignments. but maybe this helps someone else…

filter:
{
    $0["ordinal"] = .init
    {
        $0[.mod] = (by: 3, is: 0)
    }
}
projection:
{
    $0["_id"] = 1
    $0["ordinal"] = .init
    {
        $0[.add] = ("$ordinal", 5)
    }
}