Support repeating initializers with closures not just values

beccadax · July 23, 2018, 5:30am

Set indices aren't Ints, though, and you can't even know what an element's index will be without knowing the value.

If by "index" you actually mean "integer between 0 and count", I think this is drifting away from operations that feel natural for Collection.

hlovatt · July 23, 2018, 5:38am

Its got enough use cases to be included in other languages, e.g. Array - Kotlin Programming Language.

jawbroken · July 23, 2018, 5:44am

I personally think that form of the initialiser would be best just on Array, where it is most valuable and least confusing, and perhaps on some semi-related lazy collection which could just store the closure and produce elements at the given index on command. We don't need to repeat the mistake of enumerated() here by confusing the index types.

beccadax · July 23, 2018, 5:58am

Sure, but Kotlin appears to always use integer indices. And even so, it still only seems to have that initializer on Array, not MutableList or any of its other collection interfaces.

Karl · July 23, 2018, 1:52pm

The interesting thing about this code is that I actually quite like it. You're right that it isn't very discoverable; it's pretty unusual that a developer new to Swift would immediately jump for this solution. On the other hand, basically everybody knows what map does (or if they spend any time using Swift, they'll quickly learn). I like this code because it's unambiguous - nobody is going to read it and be confused as to what is going on. I would argue that it's very readable.

So you're right that a count + closure initialiser is really a map operation. Of course it is, but that's beside the point - we're talking about discoverability, not minimalism.

To be a bit clearer, the Index parameter is only really useful for integer offsets. Set indexes, for example, are meaningless outside of the Set which created them. My idea was that operations like Array(count: 10) { myDictionary[$0, default: -1] } are extremely common; and yes - it is not lost on me that it is also a map operation, but again, we're not talking about minimalism; we're talking about making the language more direct and obvious. So I think we should make the parameter an Int.

The parameter-less version is extremely limited in usefulness and I would not support its inclusion in to the standard library; it's trivial to do yourself. A version with a parameter would be a more practical replacement for the (0..<n).map dance, and I agree that part of the language could be more obvious.

davedelong · July 23, 2018, 2:35pm

I am 10000% on board with this proposal. I have my own version of this that exists in my personal set of libraries.

I do want to throw out the alternative though of a free function. We already have sequence(state:next:) and sequence(first:next:) for creating sequences of values, so what do you think about:

func sequence<T>(count: Int, next: @escaping () -> T) -> AnySequence<T>

Ben_Cohen · July 23, 2018, 5:08pm

I really don't see how:

Array(count: 10) { myDictionary[$0, default: -1] }

is more direct/obvious than:

(0..<10).map { myDictionary[$0, default: -1] }

The latter seems way clearer to me. For the array version, you must reason about what on earth it does: "well, I guess what it actually does is count up from zero to count then pass that in to the function, which uses it".

The initializer that just takes a ()->Element does not have this problem, so works as a readable shorthand. Anything more complex, sacrificing that concision, is defeating the point.

nuclearace · July 23, 2018, 5:35pm

I tend to agree. Most times I've reached for this functionality is for when I need to populate an array with random elements. If you need the indices, you're no longer really doing generation, you're mapping Int -> Element, and I've aways done (1...3).map(...)

However, I also see where those saying an Array init taking a closure with a Index might be more apparent/discoverable are coming from. To people new to the language and not familiar with FP concepts, they might not realize they could express this as a map operation.

Tino · July 23, 2018, 5:38pm

Imho it isn't obvious that map returns an Array - and maybe it won't be that way forever.

Also, when you want an array, you probably start looking at the documentation for Array what your options are.

Erica_Sadun · July 23, 2018, 6:03pm

In the first line, the goal (Array creation in this example) appears leftmost. It is a direct action, establishing a new array using conventional Type-parens initialization.

In the second, the goal is indirect, mapping from the domain of 0 ..< 10 to ten instances, ~~producing an~~ to initialize array content is ~~as a side effect~~ an appropriate application of map but is by no means its primary mission statement.

I'd argue that ten dictionary lookups is just a bad example and my reasons why the former is more readable than the latter stands as stated above. Better examples would create ten views, ten buttons, pull ten items from a file or a database or a web service (although then you start going asynchronous, so that becomes a bad example), add ten random numbers or ten random colors or some such. Nearly every time I use this map construct, it is with random elements or views but that's only in my particular work domain.

Ben_Cohen · July 23, 2018, 9:14pm

I want to push back on the idea that map produces an Array "as a side-effect". It is very much not a side-effect – it is the entire goal of the function. Yes, the fact that the result is an array is not explicit. This is a feature not a bug. In the majority of cases, it isn't important to you what the result type is. Swift as a language specifically has features to avoid cluttering your code with the names of types unnecessarily.

I agree that an initializer that takes a count and a ()->Element closure would be a good addition. It's a common operation and more readable than composing it with map, even if that composition is fairly trivial. My point is, leave it there with that readability/usability win. But that win is on the edge of being worthwhile, and as soon as you start to push it more towards flexibility at the expense of readability you lose those gains and are just duplicating existing functionality.

Ben_Cohen · July 23, 2018, 9:40pm

davedelong:

I do want to throw out the alternative though of a free function. We already have sequence(state:next:) and sequence(first:next:) for creating sequences of values, so what do you think about:
func sequence<T>(count: Int, next: @escaping () -> T) -> AnySequence<T>

I don't think those are the same as this. The unfolding sequence functions repeatedly feed data into the closure. They are inherently "sequence-y" because of their unfolding nature. That's why the name works.

Whereas the repeated closure execution does not unfold like that – it just calls the same closure over and over again. Which means if you did want to model it as a type, like Repeated, it would be a random-access collection with startIndex = 0 and endIndex = count.

Erica_Sadun · July 24, 2018, 1:15am

I stated this badly. I mean to say that in the goal of initializing array content, using map produces that initialization as...hmmm...as a properly functioning incantation but not as its primary purpose. I think that's closer if still not good a way of explaining the "mission statement" versus the "within expected parameters of proper operation of the function"

hlovatt · July 24, 2018, 6:36am

Would something along these lines be acceptable to people:

protocol Generatable {
    associatedtype GeneratedElement
    init()
    mutating func reserveCapacity(_ minimumCapacity: Int)
    mutating func insert(_ newElement: GeneratedElement, at: Int)
}
extension Generatable {
    init(_ count: Int, _ generator: () throws -> GeneratedElement) rethrows {
        self.init()
        reserveCapacity(count)
        for index in 0 ..< count {
            insert(try generator(), at: index)
        }
    }
}
protocol GeneratableFromIntIndex: Generatable {}
extension GeneratableFromIntIndex {
    init(_ count: Int, _ generator: (Int) throws -> GeneratedElement) rethrows {
        self.init()
        reserveCapacity(count)
        for index in 0 ..< count {
            insert(try generator(index), at: index)
        }
    }
}

extension Array: GeneratableFromIntIndex {
    typealias GeneratedElement = Element
}
Array(5) {
    0
} // [0, 0, 0, 0, 0]
Array(5) { index in
    index
} // [0, 1, 2, 3, 4]

extension Set: Generatable {
    typealias GeneratedElement = Element
    /// The `at` index argument is *not* used for a set (because sets do not have indexes).
    /// The insertion will always insert the new element even if the set already contains an element that equates to the the new element (because the two elements may have different identities).
    mutating func insert(_ newElement: Element, at _: Int) {
        update(with: newElement)
    }
}
Set(5) {
    Int.random(in: 0 ... 9)
} // {2, 1, 6} - up to 5 random decimal digits

extension Dictionary: Generatable, GeneratableFromIntIndex where Key == Int { // You have to add `Generatable` seperately to `GeneratableFromIntIndex` for `Dictionary` but not for `Array`!
    typealias GeneratedElement = Value
    mutating func insert(_ newElement: Value, at: Int) {
        self[at] = newElement
    }
}
Dictionary(5) {
    0
} // [3: 0, 1: 0, 2: 0, 0: 0, 4: 0] - in some order.
Dictionary(5) { index in
    index
} // [3: 3, 1: 1, 2: 2, 0: 0, 4: 4] - in some order.

It is:

Short to use.
Highly discoverable in Xcode.
You can't get confused because the index isn't an Int (it must be an Int).
Can be applied to other collection types as well as Array.

Ben_Cohen · July 24, 2018, 4:42pm

Same question as before: what new generic algorithms will you now be able to write across Set, Array and Dictionary once this protocol is added? New protocols need a very strong justification – much more than just adding new methods. So the use case needs to be very common.

hlovatt · July 25, 2018, 8:22am

I was using the protocols for code re-use not for generic algorithms, but in this case the code is short - so here is a version that repeats the code in Set and Dictionary:

import Foundation

extension RangeReplaceableCollection {
    init(_ count: Int, _ generator: () throws -> Element) rethrows {
        self.init()
        reserveCapacity(count)
        for _ in 0 ..< count {
            append(try generator())
        }
    }
}
Array(5) {
    0
} // [0, 0, 0, 0, 0]
ArraySlice(5) {
    1
} // [1, 1, 1, 1, 1]
ContiguousArray(5) {
    2
} // [2, 2, 2, 2, 2]
Data(5) {
    3
} // [3, 3, 3, 3, 3]
String(5) {
    "A"
} // "AAAAA"
String.UnicodeScalarView(5) {
    "B"
} // "BBBBB"
Substring(5) {
    "C"
} // "CCCCC"
Substring.UnicodeScalarView(5) {
    "D"
} // {... "DDDDD"}

extension Set {
    init(_ count: Int, _ generator: () throws -> Element) rethrows {
        self.init()
        reserveCapacity(count)
        for _ in 0 ..< count {
            update(with: try generator())
        }
    }
}
Set(5) {
    Int.random(in: 0 ... 9)
} // {2, 1, 6} - up to 5 random decimal digits

extension RangeReplaceableCollection where Index == Int {
    init(_ count: Int, _ generator: (Int) throws -> Element) rethrows {
        self.init()
        reserveCapacity(count)
        for index in 0 ..< count {
            insert(try generator(index), at: index)
        }
    }
}
Array(5) { index in
    index
} // [0, 1, 2, 3, 4]
ArraySlice(5) { index in
    index + 1
} // [1, 2, 3, 4, 5]
ContiguousArray(5) { index in
    index + 2
} // [2, 3, 4, 5, 6]

extension Dictionary where Key == Int {
    init(_ count: Int, _ generator: (Int) throws -> Value) rethrows {
        self.init()
        reserveCapacity(count)
        for index in 0 ..< count {
            self[index] = try generator(index)
        }
    }
}
Dictionary(5) { index in
    index
} // [3: 3, 1: 1, 2: 2, 0: 0, 4: 4] - in some order.

It still retains the desirable characteristics of:

Short to use.
Highly discoverable in Xcode.
You can't get confused because the index isn't an Int (it must be an Int).
Can be applied to other collection types as well as Array.

Joe_Groff · July 26, 2018, 4:35pm

I think there's a useful concept of an "output stream" underlying all these use cases. RangeReplaceableCollection covers this use case for collections that maintain their insertion order, but for other sinks like inserting into a dictionary or set, or a lazy sink that operates on values as they're submitted, there could be a more general interface. We already have TextOutputStream as a bit of an afterthought as part of the print system, but with no underlying BinaryOutputStream or OutputStream where Element == _ that'd make it appropriate to apply to anything other than Strings.

Ben_Cohen · July 26, 2018, 5:37pm

I'm not saying there aren't good generic algorithms that would be enabled – just that any new protocol suggestion needs to be accompanied by them.

I'm pretty skeptical of a protocol that allows generic streaming into an RRC, a Set, and a Dictionary like this though. Those are all radically different: respectively appending, coalescing, and not coalescing but inserting with an arbitrary key. I'm not sure an algorithm that generalized over each one would make much sense given how differently they behave. But some compelling examples could show how it could.

Joe_Groff · July 26, 2018, 5:39pm

The C++ STL has "output iterators", including inserters for set and map, as prior art at least (though the STL is also a master course in violating YAGNI)

Ben_Cohen · July 26, 2018, 5:55pm

The C++ inserter for map takes a pair. That at least would make it consistent with Set in coalescing equal elements. It's the arbitrary choice of an increasing integer key that I'm most skeptical of.