How does "Sequence.joined" work?

I was looking at random items at SwiftDoc.org <http://swiftdoc.org/&gt;, and noticed the “FlattenBidirectionalCollection” structure. It helps implement some versions of “joined” from Sequence (probably when the Sequence is also a BidirectionalCollection). The directions for the type state that “joined” does not create new storage. Then wouldn’t it have to refer to the source objects by reference? How; especially how does it work without requiring a “&” with “inout” or how it works with “let”-mode objects? Or am I misunderstanding how it works behind the covers?

(If there is a secret sauce to have one object refer to another without “&”/“inout”/“UnsafeWhateverPointer”, I like to know. It may help with implementing an idea. The idea involves extending the language, so “compiler magic” that the user can’t access is OK; I’d just claim to use the same sauce in my proposal.)

···


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

All this means is that `joined()` does not create an array that contains the new result. It's only as magic as the COW semantics on arrays.

···

Le 7 août 2017 à 21:12, Daryle Walker via swift-evolution <swift-evolution@swift.org> a écrit :

I was looking at random items at SwiftDoc.org <http://swiftdoc.org/&gt;, and noticed the “FlattenBidirectionalCollection” structure. It helps implement some versions of “joined” from Sequence (probably when the Sequence is also a BidirectionalCollection). The directions for the type state that “joined” does not create new storage. Then wouldn’t it have to refer to the source objects by reference? How; especially how does it work without requiring a “&” with “inout” or how it works with “let”-mode objects? Or am I misunderstanding how it works behind the covers?

(If there is a secret sauce to have one object refer to another without “&”/“inout”/“UnsafeWhateverPointer”, I like to know. It may help with implementing an idea. The idea involves extending the language, so “compiler magic” that the user can’t access is OK; I’d just claim to use the same sauce in my proposal.)


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Moving this thread to the correct mailing list.

···

Am 8. August 2017 um 06:36:18, Félix Cloutier via swift-evolution (swift-evolution@swift.org) schrieb:

All this means is that `joined()` does not create an array that contains the new result. It's only as magic as the COW semantics on arrays.

Le 7 août 2017 à 21:12, Daryle Walker via swift-evolution <swift-evolution@swift.org> a écrit :

I was looking at random items at SwiftDoc.org, and noticed the “FlattenBidirectionalCollection” structure. It helps implement some versions of “joined” from Sequence (probably when the Sequence is also a BidirectionalCollection). The directions for the type state that “joined” does not create new storage. Then wouldn’t it have to refer to the source objects by reference? How; especially how does it work without requiring a “&” with “inout” or how it works with “let”-mode objects? Or am I misunderstanding how it works behind the covers?

(If there is a secret sauce to have one object refer to another without “&”/“inout”/“UnsafeWhateverPointer”, I like to know. It may help with implementing an idea. The idea involves extending the language, so “compiler magic” that the user can’t access is OK; I’d just claim to use the same sauce in my proposal.)


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

All this means is that `joined()` does not create an array that contains the new result. It's only as magic as the COW semantics on arrays.

So you’re saying the COW semantics for Array and other standard library types have secret references/pointers that work even for “let”-mode objects, and the Sequence variants the various forms of “joined” need use a Sequence/Collection of those secret references?

···

On Aug 8, 2017, at 12:35 AM, Félix Cloutier <felixcloutier@icloud.com> wrote:

Le 7 août 2017 à 21:12, Daryle Walker via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> a écrit :

I was looking at random items at SwiftDoc.org <http://swiftdoc.org/&gt;, and noticed the “FlattenBidirectionalCollection” structure. It helps implement some versions of “joined” from Sequence (probably when the Sequence is also a BidirectionalCollection). The directions for the type state that “joined” does not create new storage. Then wouldn’t it have to refer to the source objects by reference? How; especially how does it work without requiring a “&” with “inout” or how it works with “let”-mode objects? Or am I misunderstanding how it works behind the covers?

(If there is a secret sauce to have one object refer to another without “&”/“inout”/“UnsafeWhateverPointer”, I like to know. It may help with implementing an idea. The idea involves extending the language, so “compiler magic” that the user can’t access is OK; I’d just claim to use the same sauce in my proposal.)


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

All this means is that `joined()` does not create an array that contains the new result. It's only as magic as the COW semantics on arrays.

So you’re saying the COW semantics for Array and other standard library types have secret references/pointers that work even for “let”-mode objects, and the Sequence variants the various forms of “joined” need use a Sequence/Collection of those secret references?

Why would .joined() need some sort of special reference instead of just storing the collection values as normal values?

John.

···

On Aug 8, 2017, at 3:24 PM, Daryle Walker via swift-evolution <swift-evolution@swift.org> wrote:

On Aug 8, 2017, at 12:35 AM, Félix Cloutier <felixcloutier@icloud.com <mailto:felixcloutier@icloud.com>> wrote:

Le 7 août 2017 à 21:12, Daryle Walker via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> a écrit :

I was looking at random items at SwiftDoc.org <http://swiftdoc.org/&gt;, and noticed the “FlattenBidirectionalCollection” structure. It helps implement some versions of “joined” from Sequence (probably when the Sequence is also a BidirectionalCollection). The directions for the type state that “joined” does not create new storage. Then wouldn’t it have to refer to the source objects by reference? How; especially how does it work without requiring a “&” with “inout” or how it works with “let”-mode objects? Or am I misunderstanding how it works behind the covers?

(If there is a secret sauce to have one object refer to another without “&”/“inout”/“UnsafeWhateverPointer”, I like to know. It may help with implementing an idea. The idea involves extending the language, so “compiler magic” that the user can’t access is OK; I’d just claim to use the same sauce in my proposal.)


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Daryle Walker via swift-evolution <swift-evolution@swift.org> schrieb am
Di. 8. Aug. 2017 um 21:25:

All this means is that `joined()` does not create an array that contains
the new result. It's only as magic as the COW semantics on arrays.

So you’re saying the COW semantics for Array and other standard library
types have secret references/pointers that work even for “let”-mode
objects, and the Sequence variants the various forms of “joined” need use a
Sequence/Collection of those secret references?

I know nothing about this specific type under the hood and your question
stumped me when I first saw it as well, so take this with a grain of salt:

I think it's basically just storing the arrays internally (as let) and when
you iterate through the collection it just goes through the subsequences
one by one, when the last index of the first is reached it begins with the
next subsequence.

As for how it avoids creating new storage, simple. As someone else
mentioned, this is no more magic than Copy On Write for normal arrays.

let a = [1,2,3]
let b = a // this doesn't produce a copy of the underlying buffer.. I.e.
value semantics but only one buffer needed

^^^ This is the take-home message. And your intuition about COW is correct:
its internal storage is a reference type containing a buffer pointer. When
(and only when) a mutation occurs, the buffer is copied and the new storage
becomes the backing for the resulting struct. Any existing copies remain
unchanged (and truly immutable) because they keep their original storage.

So with that understanding of COW, it becomes easy to imagine all sorts of
containers that don't require additional storage for their contents:

struct JoinedSequenceOfThreeArrays<T> {
  let array1: [T]
  let array2: [T]
  let array3: [T]
}

// still only one array buffer storage is required for all of this:
let c = JoinedSequenceOfThreeArrays(array1: a, array2: a, array3: b)

Does that make sense?

Geordie

···

On Aug 8, 2017, at 12:35 AM, Félix Cloutier <felixcloutier@icloud.com> > wrote:

Le 7 août 2017 à 21:12, Daryle Walker via swift-evolution < > swift-evolution@swift.org> a écrit :

I was looking at random items at SwiftDoc.org <http://swiftdoc.org/&gt;, and
noticed the “FlattenBidirectionalCollection” structure. It helps implement
some versions of “joined” from Sequence (probably when the Sequence is also
a BidirectionalCollection). The directions for the type state that “joined”
does not create new storage. Then wouldn’t it have to refer to the source
objects by reference? How; especially how does it work without requiring a
“&” with “inout” or how it works with “let”-mode objects? Or am I
misunderstanding how it works behind the covers?

(If there is a secret sauce to have one object refer to another without
“&”/“inout”/“UnsafeWhateverPointer”, I like to know. It may help with
implementing an idea. The idea involves extending the language, so
“compiler magic” that the user can’t access is OK; I’d just claim to use
the same sauce in my proposal.)


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Yes, exactly. An Array<T> is a struct wrapper for a reference type representing storage. Mutating functions first check if they own the only reference to the storage using isKnownUniquelyReferenced <https://developer.apple.com/documentation/swift/2429905-isknownuniquelyreferenced&gt;\. If not, they make a fresh copy before applying the mutating operation.

There's no difference for `let` arrays. Access control is enforced at compile-time through Array's design: the compiler will prevent you from calling `mutating` functions on `let` structs, and Array is careful to not expose functionality that could modify its storage outside of `mutating` functions.

There is no secret. Anyone could implement the same thing only using publicly available and documented compiler features. In fact, it's been done already for some very powerful collections <https://github.com/lorentey/BTree&gt;\.

···

Le 8 août 2017 à 15:45, Geordie Jay <geojay@gmail.com> a écrit :

Daryle Walker via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> schrieb am Di. 8. Aug. 2017 um 21:25: >> On Aug 8, 2017, at 12:35 AM, Félix Cloutier <felixcloutier@icloud.com <mailto:felixcloutier@icloud.com>> wrote:

All this means is that `joined()` does not create an array that contains the new result. It's only as magic as the COW semantics on arrays.

So you’re saying the COW semantics for Array and other standard library types have secret references/pointers that work even for “let”-mode objects, and the Sequence variants the various forms of “joined” need use a Sequence/Collection of those secret references?

I know nothing about this specific type under the hood and your question stumped me when I first saw it as well, so take this with a grain of salt:

I think it's basically just storing the arrays internally (as let) and when you iterate through the collection it just goes through the subsequences one by one, when the last index of the first is reached it begins with the next subsequence.

As for how it avoids creating new storage, simple. As someone else mentioned, this is no more magic than Copy On Write for normal arrays.

let a = [1,2,3]
let b = a // this doesn't produce a copy of the underlying buffer.. I.e. value semantics but only one buffer needed

^^^ This is the take-home message. And your intuition about COW is correct: its internal storage is a reference type containing a buffer pointer. When (and only when) a mutation occurs, the buffer is copied and the new storage becomes the backing for the resulting struct. Any existing copies remain unchanged (and truly immutable) because they keep their original storage.

https://github.com/apple/swift/blob/master/stdlib/public/core/ContiguousArrayBuffer.swift

So with that understanding of COW, it becomes easy to imagine all sorts of containers that don't require additional storage for their contents:

struct JoinedSequenceOfThreeArrays<T> {
  let array1: [T]
  let array2: [T]
  let array3: [T]
}

// still only one array buffer storage is required for all of this:
let c = JoinedSequenceOfThreeArrays(array1: a, array2: a, array3: b)

Does that make sense?

Geordie

Le 7 août 2017 à 21:12, Daryle Walker via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> a écrit :

I was looking at random items at SwiftDoc.org <http://swiftdoc.org/&gt;, and noticed the “FlattenBidirectionalCollection” structure. It helps implement some versions of “joined” from Sequence (probably when the Sequence is also a BidirectionalCollection). The directions for the type state that “joined” does not create new storage. Then wouldn’t it have to refer to the source objects by reference? How; especially how does it work without requiring a “&” with “inout” or how it works with “let”-mode objects? Or am I misunderstanding how it works behind the covers?

(If there is a secret sauce to have one object refer to another without “&”/“inout”/“UnsafeWhateverPointer”, I like to know. It may help with implementing an idea. The idea involves extending the language, so “compiler magic” that the user can’t access is OK; I’d just claim to use the same sauce in my proposal.)


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

This isn’t entirely true. That BTree module readme seems to contain a lot
of unsubstantiated hyperbole. It’s possible to implement a classic
red-black tree in Swift that performs better than a sorted Array, down to
about *n* = 1,500 items, not *n* = *100,000* items as it claims. (Actually,
heap allocators these days are good enough that performance is on par with
Array all the way down to *n* = 1.) Red-Black trees are slow when
*distributed* as packages because of the crossmodule optimization boundary.
(This also means the BTree module is much slower than Array for most
reasonable *n*.) It’s possible to write modules using compiler attributes
that mitigate this slowdown (reclaiming over 50% of lost performance) but
it’s hacky and forces you to design your libraries like the standard
library (meaning: ugly underscored properties everywhere and everything is
public). And these features aren’t “publicly available” or documented at
all.

···

On Wed, Aug 9, 2017 at 12:29 AM, Félix Cloutier via swift-evolution < swift-evolution@swift.org> wrote:

Yes, exactly. An Array<T> is a struct wrapper for a reference type
representing storage. Mutating functions first check if they own the only
reference to the storage using isKnownUniquelyReferenced
<https://developer.apple.com/documentation/swift/2429905-isknownuniquelyreferenced&gt;\.
If not, they make a fresh copy before applying the mutating operation.

There's no difference for `let` arrays. Access control is enforced at
compile-time through Array's design: the compiler will prevent you from
calling `mutating` functions on `let` structs, and Array is careful to not
expose functionality that could modify its storage outside of `mutating`
functions.

There is no secret. Anyone could implement the same thing only using
publicly available and documented compiler features. In fact, it's been
done already for some very powerful collections
<https://github.com/lorentey/BTree&gt;\.

All this means is that `joined()` does not create an array that contains the new result. It's only as magic as the COW semantics on arrays.

So you’re saying the COW semantics for Array and other standard library types have secret references/pointers that work even for “let”-mode objects, and the Sequence variants the various forms of “joined” need use a Sequence/Collection of those secret references?

I know nothing about this specific type under the hood and your question stumped me when I first saw it as well, so take this with a grain of salt:

I think it's basically just storing the arrays internally (as let) and when you iterate through the collection it just goes through the subsequences one by one, when the last index of the first is reached it begins with the next subsequence.

As for how it avoids creating new storage, simple. As someone else mentioned, this is no more magic than Copy On Write for normal arrays.

let a = [1,2,3]
let b = a // this doesn't produce a copy of the underlying buffer.. I.e. value semantics but only one buffer needed

^^^ This is the take-home message. And your intuition about COW is correct: its internal storage is a reference type containing a buffer pointer. When (and only when) a mutation occurs, the buffer is copied and the new storage becomes the backing for the resulting struct. Any existing copies remain unchanged (and truly immutable) because they keep their original storage.

https://github.com/apple/swift/blob/master/stdlib/public/core/ContiguousArrayBuffer.swift

So with that understanding of COW, it becomes easy to imagine all sorts of containers that don't require additional storage for their contents:

struct JoinedSequenceOfThreeArrays<T> {
  let array1: [T]
  let array2: [T]
  let array3: [T]
}

// still only one array buffer storage is required for all of this:
let c = JoinedSequenceOfThreeArrays(array1: a, array2: a, array3: b)

Does that make sense?

Mostly, then I realized a flaw with this explanation. Your theory implies that “joined” avoids creating new storage by betraying that and actually copying the containers, but said containers use-COW/reference-remote-storage themselves. Then what happens when a Sequence uses scoped storage for its items? (Example: the mythical “just slap Collection on a tuple and call it an array” type.) Either “joined” uses a different no-copy technique or it’s lying about saving on storage (and the lack of scoped-storage Sequences means no one has called them on it yet).

···

On Aug 8, 2017, at 6:45 PM, Geordie Jay <geojay@gmail.com> wrote:
Daryle Walker via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> schrieb am Di. 8. Aug. 2017 um 21:25: >> On Aug 8, 2017, at 12:35 AM, Félix Cloutier <felixcloutier@icloud.com <mailto:felixcloutier@icloud.com>> wrote:

Le 7 août 2017 à 21:12, Daryle Walker via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> a écrit :

I was looking at random items at SwiftDoc.org <http://swiftdoc.org/&gt;, and noticed the “FlattenBidirectionalCollection” structure. It helps implement some versions of “joined” from Sequence (probably when the Sequence is also a BidirectionalCollection). The directions for the type state that “joined” does not create new storage. Then wouldn’t it have to refer to the source objects by reference? How; especially how does it work without requiring a “&” with “inout” or how it works with “let”-mode objects? Or am I misunderstanding how it works behind the covers?

(If there is a secret sauce to have one object refer to another without “&”/“inout”/“UnsafeWhateverPointer”, I like to know. It may help with implementing an idea. The idea involves extending the language, so “compiler magic” that the user can’t access is OK; I’d just claim to use the same sauce in my proposal.)


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

This seems harsh. I didn't notice Félix making any claims about BTree's
performance. The necessary API for implementing COW is indisputably public
and documented:

···

On Tue, Aug 8, 2017 at 11:51 PM, Taylor Swift via swift-users < swift-users@swift.org> wrote:

On Wed, Aug 9, 2017 at 12:29 AM, Félix Cloutier via swift-evolution < > swift-evolution@swift.org> wrote:

Yes, exactly. An Array<T> is a struct wrapper for a reference type
representing storage. Mutating functions first check if they own the only
reference to the storage using isKnownUniquelyReferenced
<https://developer.apple.com/documentation/swift/2429905-isknownuniquelyreferenced&gt;\.
If not, they make a fresh copy before applying the mutating operation.

There's no difference for `let` arrays. Access control is enforced at
compile-time through Array's design: the compiler will prevent you from
calling `mutating` functions on `let` structs, and Array is careful to not
expose functionality that could modify its storage outside of `mutating`
functions.

There is no secret. Anyone could implement the same thing only using
publicly available and documented compiler features. In fact, it's been
done already for some very powerful collections
<https://github.com/lorentey/BTree&gt;\.

This isn’t entirely true. That BTree module readme seems to contain a lot
of unsubstantiated hyperbole. It’s possible to implement a classic
red-black tree in Swift that performs better than a sorted Array, down to
about *n* = 1,500 items, not *n* = *100,000* items as it claims.
(Actually, heap allocators these days are good enough that performance is
on par with Array all the way down to *n* = 1.) Red-Black trees are slow
when *distributed* as packages because of the crossmodule optimization
boundary. (This also means the BTree module is much slower than Array for
most reasonable *n*.) It’s possible to write modules using compiler
attributes that mitigate this slowdown (reclaiming over 50% of lost
performance) but it’s hacky and forces you to design your libraries like
the standard library (meaning: ugly underscored properties everywhere and
everything is public). And these features aren’t “publicly available” or
documented at all.

I really didn’t mean it to be harsh (sorry if it sounded that way :slightly_frowning_face:), it’s
just that people tend to be overly optimistic about the performance that
can be achieved with custom Collection packages. It’s true that you can
imitate the *functionality* of stdlib Collection types with public and
documented Swift features, but such custom Collections (when distributed as
packages) are almost never effective at improving an application’s
performance due to the huge constant factor of cross module calls, unless
the library author was willing to make use of undocumented compiler
features.

···

On Wed, Aug 9, 2017 at 1:50 AM, Rob Mayoff <mayoff@dqd.com> wrote:

On Tue, Aug 8, 2017 at 11:51 PM, Taylor Swift via swift-users < > swift-users@swift.org> wrote:

On Wed, Aug 9, 2017 at 12:29 AM, Félix Cloutier via swift-evolution < >> swift-evolution@swift.org> wrote:

Yes, exactly. An Array<T> is a struct wrapper for a reference type
representing storage. Mutating functions first check if they own the only
reference to the storage using isKnownUniquelyReferenced
<https://developer.apple.com/documentation/swift/2429905-isknownuniquelyreferenced&gt;\.
If not, they make a fresh copy before applying the mutating operation.

There's no difference for `let` arrays. Access control is enforced at
compile-time through Array's design: the compiler will prevent you from
calling `mutating` functions on `let` structs, and Array is careful to not
expose functionality that could modify its storage outside of `mutating`
functions.

There is no secret. Anyone could implement the same thing only using
publicly available and documented compiler features. In fact, it's been
done already for some very powerful collections
<https://github.com/lorentey/BTree&gt;\.

This isn’t entirely true. That BTree module readme seems to contain a lot
of unsubstantiated hyperbole. It’s possible to implement a classic
red-black tree in Swift that performs better than a sorted Array, down
to about *n* = 1,500 items, not *n* = *100,000* items as it claims.
(Actually, heap allocators these days are good enough that performance is
on par with Array all the way down to *n* = 1.) Red-Black trees are slow
when *distributed* as packages because of the crossmodule optimization
boundary. (This also means the BTree module is much slower than Array
for most reasonable *n*.) It’s possible to write modules using compiler
attributes that mitigate this slowdown (reclaiming over 50% of lost
performance) but it’s hacky and forces you to design your libraries like
the standard library (meaning: ugly underscored properties everywhere and
everything is public). And these features aren’t “publicly available” or
documented at all.

This seems harsh. I didn't notice Félix making any claims about BTree's
performance. The necessary API for implementing COW is indisputably public
and documented:

https://developer.apple.com/documentation/swift/2429905-
isknownuniquelyreferenced

I’m not sure how that can be because you lose locality (but that probably depends on what operations “perform” includes). And IMO, heap allocations remain (and will remain) a bottleneck for small allocations.

  Daniel.

···

On 9. Aug 2017, at 06:51, Taylor Swift via swift-users <swift-users@swift.org> wrote:

It’s possible to implement a classic red-black tree in Swift that performs better than a sorted Array, down to about n = 1,500 items, not n = 100,000 items as it claims. (Actually, heap allocators these days are good enough that performance is on par with Array all the way down to n = 1.)

I don’t see any contradiction here. A tuple is a value type; it has no
concept of copy-on-write because it’s copy-on-read. So JoinedSequence will
store a copy of the tuple instead of a buffer reference

···

On Wed, Aug 9, 2017 at 10:43 AM, Daryle Walker via swift-evolution < swift-evolution@swift.org> wrote:

On Aug 8, 2017, at 6:45 PM, Geordie Jay <geojay@gmail.com> wrote:

Daryle Walker via swift-evolution <swift-evolution@swift.org> schrieb am
Di. 8. Aug. 2017 um 21:25:

On Aug 8, 2017, at 12:35 AM, Félix Cloutier <felixcloutier@icloud.com> >> wrote:

All this means is that `joined()` does not create an array that contains
the new result. It's only as magic as the COW semantics on arrays.

So you’re saying the COW semantics for Array and other standard library
types have secret references/pointers that work even for “let”-mode
objects, and the Sequence variants the various forms of “joined” need use a
Sequence/Collection of those secret references?

I know nothing about this specific type under the hood and your question
stumped me when I first saw it as well, so take this with a grain of salt:

I think it's basically just storing the arrays internally (as let) and
when you iterate through the collection it just goes through the
subsequences one by one, when the last index of the first is reached it
begins with the next subsequence.

As for how it avoids creating new storage, simple. As someone else
mentioned, this is no more magic than Copy On Write for normal arrays.

let a = [1,2,3]
let b = a // this doesn't produce a copy of the underlying buffer.. I.e.
value semantics but only one buffer needed

^^^ This is the take-home message. And your intuition about COW is
correct: its internal storage is a reference type containing a buffer
pointer. When (and only when) a mutation occurs, the buffer is copied and
the new storage becomes the backing for the resulting struct. Any existing
copies remain unchanged (and truly immutable) because they keep their
original storage.

https://github.com/apple/swift/blob/master/stdlib/public/core/
ContiguousArrayBuffer.swift

So with that understanding of COW, it becomes easy to imagine all sorts of
containers that don't require additional storage for their contents:

struct JoinedSequenceOfThreeArrays<T> {
  let array1: [T]
  let array2: [T]
  let array3: [T]
}

// still only one array buffer storage is required for all of this:
let c = JoinedSequenceOfThreeArrays(array1: a, array2: a, array3: b)

Does that make sense?

Mostly, then I realized a flaw with this explanation. Your theory implies
that “joined” avoids creating new storage by betraying that and actually
copying the containers, but said containers use-COW/reference-remote-storage
themselves. Then what happens when a Sequence uses scoped storage for its
items? (Example: the mythical “just slap Collection on a tuple and call it
an array” type.) Either “joined” uses a different no-copy technique or it’s
lying about saving on storage (and the lack of scoped-storage Sequences
means no one has called them on it yet).

The benefit that standard library containers have over containers from other modules is that they're optimized as if they were part of your own module, so you can get the same thing by including the collection classes in your own executable instead of linking with the module. It's my understanding that the @_transparent attribute could surface to the public side with module format stability.

Either way, the issue of cross-module optimizations is separate from COW mechanics, and it was certainly not my goal to distract anyone from the main topic by including an after-thought reference to an algorithmically impressive package. The important takeaway is that COW semantics don't rely on special features, joined() doesn't have to go out of its way to ensure that new storage isn't allocated, and the one important library feature behind these containers, isKnownUniquelyReferenced, lives in broad daylight.

···

Le 8 août 2017 à 22:56, Taylor Swift <kelvin13ma@gmail.com> a écrit :

On Wed, Aug 9, 2017 at 1:50 AM, Rob Mayoff <mayoff@dqd.com <mailto:mayoff@dqd.com>> wrote:
On Tue, Aug 8, 2017 at 11:51 PM, Taylor Swift via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

On Wed, Aug 9, 2017 at 12:29 AM, Félix Cloutier via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
Yes, exactly. An Array<T> is a struct wrapper for a reference type representing storage. Mutating functions first check if they own the only reference to the storage using isKnownUniquelyReferenced <https://developer.apple.com/documentation/swift/2429905-isknownuniquelyreferenced&gt;\. If not, they make a fresh copy before applying the mutating operation.

There's no difference for `let` arrays. Access control is enforced at compile-time through Array's design: the compiler will prevent you from calling `mutating` functions on `let` structs, and Array is careful to not expose functionality that could modify its storage outside of `mutating` functions.

There is no secret. Anyone could implement the same thing only using publicly available and documented compiler features. In fact, it's been done already for some very powerful collections <https://github.com/lorentey/BTree&gt;\.

This isn’t entirely true. That BTree module readme seems to contain a lot of unsubstantiated hyperbole. It’s possible to implement a classic red-black tree in Swift that performs better than a sorted Array, down to about n = 1,500 items, not n = 100,000 items as it claims. (Actually, heap allocators these days are good enough that performance is on par with Array all the way down to n = 1.) Red-Black trees are slow when distributed as packages because of the crossmodule optimization boundary. (This also means the BTree module is much slower than Array for most reasonable n.) It’s possible to write modules using compiler attributes that mitigate this slowdown (reclaiming over 50% of lost performance) but it’s hacky and forces you to design your libraries like the standard library (meaning: ugly underscored properties everywhere and everything is public). And these features aren’t “publicly available” or documented at all.

This seems harsh. I didn't notice Félix making any claims about BTree's performance. The necessary API for implementing COW is indisputably public and documented:

isKnownUniquelyReferenced(_:) | Apple Developer Documentation

I really didn’t mean it to be harsh (sorry if it sounded that way :slightly_frowning_face:), it’s just that people tend to be overly optimistic about the performance that can be achieved with custom Collection packages. It’s true that you can imitate the functionality of stdlib Collection types with public and documented Swift features, but such custom Collections (when distributed as packages) are almost never effective at improving an application’s performance due to the huge constant factor of cross module calls, unless the library author was willing to make use of undocumented compiler features.

If the number of nodes is small, Swift’s heap allocator will actually place
then contiguously in memory, just like an Array, as long as you don’t wait
too long between calls to insert(_:). You can also write your own simple
memory allocator in Swift that’s backed by an Array which guarantees the
nodes will live near each other in memory. But that’s getting off topic lol

···

On Wed, Aug 9, 2017 at 5:54 AM, Daniel Vollmer via swift-evolution < swift-evolution@swift.org> wrote:

> On 9. Aug 2017, at 06:51, Taylor Swift via swift-users < > swift-users@swift.org> wrote:
>
> It’s possible to implement a classic red-black tree in Swift that
performs better than a sorted Array, down to about n = 1,500 items, not n =
100,000 items as it claims. (Actually, heap allocators these days are good
enough that performance is on par with Array all the way down to n = 1.)

I’m not sure how that can be because you lose locality (but that probably
depends on what operations “perform” includes). And IMO, heap allocations
remain (and will remain) a bottleneck for small allocations.

        Daniel.
_______________________________________________

Besides meaning the explanation for “joined” is lying, what if the value type is large enough that copies should be minimized?

Hmm, it seems that there’s a hole in the language in that value-type “let”-mode instances cannot be aliased/referenced.

···

On Aug 9, 2017, at 12:19 PM, Taylor Swift <kelvin13ma@gmail.com> wrote:

On Wed, Aug 9, 2017 at 10:43 AM, Daryle Walker via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Aug 8, 2017, at 6:45 PM, Geordie Jay <geojay@gmail.com <mailto:geojay@gmail.com>> wrote:

Daryle Walker via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> schrieb am Di. 8. Aug. 2017 um 21:25: >>> On Aug 8, 2017, at 12:35 AM, Félix Cloutier <felixcloutier@icloud.com <mailto:felixcloutier@icloud.com>> wrote:

All this means is that `joined()` does not create an array that contains the new result. It's only as magic as the COW semantics on arrays.

So you’re saying the COW semantics for Array and other standard library types have secret references/pointers that work even for “let”-mode objects, and the Sequence variants the various forms of “joined” need use a Sequence/Collection of those secret references?

I know nothing about this specific type under the hood and your question stumped me when I first saw it as well, so take this with a grain of salt:

I think it's basically just storing the arrays internally (as let) and when you iterate through the collection it just goes through the subsequences one by one, when the last index of the first is reached it begins with the next subsequence.

As for how it avoids creating new storage, simple. As someone else mentioned, this is no more magic than Copy On Write for normal arrays.

let a = [1,2,3]
let b = a // this doesn't produce a copy of the underlying buffer.. I.e. value semantics but only one buffer needed

^^^ This is the take-home message. And your intuition about COW is correct: its internal storage is a reference type containing a buffer pointer. When (and only when) a mutation occurs, the buffer is copied and the new storage becomes the backing for the resulting struct. Any existing copies remain unchanged (and truly immutable) because they keep their original storage.

https://github.com/apple/swift/blob/master/stdlib/public/core/ContiguousArrayBuffer.swift

So with that understanding of COW, it becomes easy to imagine all sorts of containers that don't require additional storage for their contents:

struct JoinedSequenceOfThreeArrays<T> {
  let array1: [T]
  let array2: [T]
  let array3: [T]
}

// still only one array buffer storage is required for all of this:
let c = JoinedSequenceOfThreeArrays(array1: a, array2: a, array3: b)

Does that make sense?

Mostly, then I realized a flaw with this explanation. Your theory implies that “joined” avoids creating new storage by betraying that and actually copying the containers, but said containers use-COW/reference-remote-storage themselves. Then what happens when a Sequence uses scoped storage for its items? (Example: the mythical “just slap Collection on a tuple and call it an array” type.) Either “joined” uses a different no-copy technique or it’s lying about saving on storage (and the lack of scoped-storage Sequences means no one has called them on it yet).

I don’t see any contradiction here. A tuple is a value type; it has no concept of copy-on-write because it’s copy-on-read. So JoinedSequence will store a copy of the tuple instead of a buffer reference


Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com