Swift is still sometimes so cumbersome

I found myself wanting to get an optional string from an HTTP response, and to truncate it to some maximum length for debug logging. I would like to write this:

var body = inResponse.result.value as? String
body = body?.prefix(1024)

But that doesn't work because prefix() returns Substring. So I write:

var body = inResponse.result.value as? String
body = String(body?.prefix(1024))

But I can't, because I can't pass an optional Substring to String(). So I can write something like:

var body = inResponse.result.value as? String
if let b = body { body = String(b.prefix(1024)) }

But look how readability suffers. My goal is to get an optional String of known maximum length.

Obviously I can add an extension to String to wrap that mess, but that increases the API surface, and clashes somewhat, in that it's a “different kind” of substring operation.

I'm not sure what I'm hoping to accomplish with this post. Is there a more concise way I've overlooked? Is there a fundamental convention that could be better (e.g. String() accepts optional arguments and returns a String??)

And I’m sure it’s been hashed to death, but why is a Substring not a String? It sure seems to me to be one.

1 Like

It's more compact, if somewhat less readable, to use Optional's map or flatMap:

let body = inResponse.result.value.flatMap { $0 as? String }
    .map { $0.prefix(1024) }
    .map(String.init)

Note I did this without a compiler, but you should be able to accomplish something similar.

And yes, SubString not being a String is very annoying.

2 Likes

Not sure that's much more compact than what I had, and I worry that a cursory reading of it might sow confusion (“why is this code mapping anything, let alone twice?”).

On this particular question: to avoid a memory leak you are possibly going to hit in this very use case. Imagine the string was 100mb, and you only want the first 100 characters stored in body. If body is a long-lived variable you are essentially leaking 100mb for the lifetime of body. There was recently a long thread investigating a major memory leak in a server-side NIO app that ended up being a variant of this problem (only with Data and ByteBuffer). This leak scenario was such a problem for Java that they changed string's substring op to make a copy (it used to share storage). But this messed other people up because that means substring ops go from being O(1) to being O(n).

15 Likes

Could Swift determine that the original String is no longer being referenced, and just shrink the memory in place? I guess that's a pretty specialized use case. That is, I guess truncating a buffer isn't terribly common, is it? Not to mention that it might not be possible to reclaim the space used by such a truncation with certain memory allocators.

I suppose the best answer then is a parallel set of methods on String that return String, with the understanding that they're not O(1). Just a pity that we all have to reinvent them.

The alternative would be a String() constructor that takes and returns Optional. But again, seems like any class that has distinct slice types would need this.

I'm not sure there's a good way around the readability issues if you can't assume some familiarity with basic Swift types and their APIs. And in that case you're excluding large chunks of API from your codebase. Maps are just transforms and occur a lot in Swift code. You could also combine the last two map calls into one:

.map { String($0.prefix(1024) }
1 Like

One question to ask is, what do you plan to do if any of these operations return nil. If the answer is, they cannot possibly, then consider as! to kill the optionality early. Or as? String ?? "" if nil is possible but just means the same as an empty string for your purposes.

4 Likes

Of course, but it’s still much nicer to write:

s = s?.prefix(1024)

You can't deny that's more readable.

In my case, later code deals with it being nil. I already had to deal with it being nil, I just wanted to also limit the string length in case it was not nil. Yes, in this case, I could do other things to simply the code, e.g.:

var body = inResponse.result.value as? String ?? "<no body>"
body = String(body.prefix(1024))
debugLog("Response: \(body)")

But it's still a bit cumbersome (for the reasons you mentioned about memory), don't you think?

Right. One downside of the number of affordances Swift gives to working with Optionals is that it does lead people away from the fact that the best approach is almost always to filter away the optionality — usually with a fold (like ??) or an early-exit — in some sensible way soon after getting the optional value.

9 Likes

In this case, you should probably look for ways to work with the body as a Substring rather than trying to appease the type-checker by turning it back into a String.

I find I like to preserve the optionality of a variable as long as possible, because I don't know if later I'll need to take advantage of the fact that it was nil earlier on.

I just find this comes up a lot: At a high level (not considering low-level implementation considerations like memory use), a substring is still a string.

At the risk of turning Swift into APL, what do you guys think of this notion: a way of promoting something like Substring back to String:

var body: String? = ...
body = body?.prefix(1024)↑

This would be equivalent to

var b: String? = ...
if let b = body { body = String(b.prefix(1024)) }

I think the operator can be created for specific type relationships like String and Substring with explicit definitions for those types. But if Swift embodied the notion of a subtype relationship, it could automatically do this for arbitrary subtypes.

(My terminology may be imprecise but I think the gist is clear.)

It's terrible style, but you could use .description to do exactly that.

In general, it's unfortunate that we use call notation for type conversions, since they're often very useful to chain as part of optional accesses.

8 Likes

You're right that this is a low-level distinction, but it's important enough one that we felt we had no good choice but to raise it to programmers. A lot of glue code is not really sensitive to algorithmic-complexity problems unless it's put in an unusual situation, but if you're copying strings on every substring operation, that's pretty likely to affect you. Code that works with substrings a lot should almost always be starting by converting to Substring and then only converting back to String at the edges (and when necessary, which it's not in order to use e.g. string interpolation).

4 Likes

What are you truncating the body for? Is it actually worth scanning for up to 1024 grapheme cluster boundaries?

Regardless of performance in this case, using a mutating API style makes the source‐level clumsiness go away in general:

extension RangeReplaceableCollection {
    mutating func truncate(to count: Int) {
        let end = index(startIndex, offsetBy: count, limitedBy: endIndex)
        // Note: The previous line does a lot of work for String,
        // since String is not random access.
        removeSubrange(end...)
    }
}

// Usage site:
var body = inResponse.result.value as? String
body?.truncate(to: 1024)
2 Likes

This is why StringProtocol exists - to be a higher-level "supertype" when you don't care if the thing is a String or Substring.

FWIW, shared Strings will allow you to turn a Substring in to a String without copying, but you'll need to scope the access within a closure because of the lifetime dependency. If the shared String escapes from that closure, you'll be leaking the whole parent String's storage. So you'd need to be careful when using it.

I think that feature is basically just waiting for somebody to write a formal proposal. @Michael_Ilseman?

I actually think this is a design mistake with Data. None of the standard library collections (Array, String, etc) are their own slice-types, even though they could be. It's been a while since I checked, but IIRC I couldn't find any rationale about why it is that way. Data is more likely to represent a large amount of memory, so it probably makes even more sense for it to have an independent slice type.

Probably too late to do anything about it, though.

In this context the self-slicing nature of Data is not the problem. The problem was converting part of a NIO ByteBuffer into a Data. The NIO code uses Data(bytesNoCopy:count:deallocator:) to initialise the data, thereby sharing the backing storage of the ByteBuffer. This can be surprising in cases where the goal was to slice part of the ByteBuffer out, and the original ByteBuffer was large.

This is an inevitable consequence of any situation where an API will attempt to share storage: you can end up in a place where the sharing of storage is not what you want. It's an ongoing challenge of API design to communicate this clearly to users.

I just bite the bullet and add subscript extensions that return String. So that body[..<1024] would be a String.

I know that the language is trying to save me from myself, but it just isn't ergonomic otherwise.