Initializers for Substring and Arrayslice

Howdy everybody,

I was looking at the documentation for the Substring and Arrayslice types, and I saw that they have initializers similar to those for String and Array respectively, i.e., from literals, no data, raw data, etc. This seems to go against the idea of these types, where they offer a view into an existing data structure. I imagine that the init(_ content: String) initializer on Substring creates a String, and then a Substring that holds the only reference to that String. It seems the Substring returned in this instance offers no additional functionality over just creating a String in the first place. Is it reasonable to remove initializers that don't reference an existing instance and a range within it?

1 Like

Substring and ArraySlice have an often useful property, that Self.SubSequence == Self. There are many algorithms that rely on this, especially recursive algorithms. You basically have 2 choices:

  1. Make the algorithm operate with inputs of type String/Array. Every recursive call is forced to promote a Substring/ArraySlice to a String/Array, which causes a wasteful memory copy.
  2. Make the algorithm operate with inputs of type Substring/ArraySlice, and make a helper function that takes a String/Array which use the initializes you mentioned to create the a Substring/ArraySlice containing the entire input, which it then passes onto the algorithm

Approach #1 is wasteful, and #2 is the only approach that can be generified (it could be generified to work on any T: Sequence where T.SubSequence == T.

2 Likes

I don’t see how your response relates to OP’s concern about initializers. His point is that slices should always be views, therefore constructed from an existing instance, not from its content.

Do the current initializer signatures come from some protocol requirements?

1 Like

That's not at all how I understood it, in particular, I was responding to:

The ability to generically repeatedly slice a slice and stay within the same type is this "additional functionality" that justifies the existence of such initializers.

I think he offered context and then asked:

I think I understand what Alexander what talking about, and I think I agree. If I'm understanding it correctly, the need for these initializers comes from the fact that data used for these functions could be found anywhere, and it's best to have an initializer that can handle those situations. For example, if I had a section of a Data struct cast as UTF8 character data, I would be able to write a function that did the needed processing of that data without first creating an instance of a String. I'm not sure if this requires the creation of a String behind the scenes, but even if so, in certain cases it's a convenience to have these types of initializers.

One reason for this is that you can use a slice to modify a collection i.e.:

var a = Array(0..<10)
a[2...5] = [99,99]
a[6...] = []
print(a) // [0,1,99,99,6,7]

This works because of the definition of {MutableCollection,RangeReplaceableCollection}.subscript(bounds: Range<Index>) -> SubSequence { get set }, which can assign any subsequence from the rhs into a slice (in the case of MutableCollection it needs to be the same size). This wouldn't work as neatly if you couldn't create slices independently.

4 Likes

Funny enough, just ran into this question, which perfectly exemplifies the utility of such an initializer: swift4 - Parsing array recursively in Swift 4 - Stack Overflow

This is a great point, but in my opinion this is the result of a sub-optimal API (and/or perhaps the lack of some language features). While the getter of that subscript should absolutely return a SubSequence, the setter would ideally take any Sequence that has the right Element, and then your code wouldn't require ArraySlice to be ExpressibleByArrayLiteral anymore.

If I’m not mistaken, I think this would be an ideal case of using Substring or ArraySlice, since those types use the same indicies as the original instance, and thus access elements in O(1) time. Sequence itself makes no such guarantee.

Wouldn't these benefits only apply to the get part and not to the set part?