[Pitch] AttributedString Tracking Indices

Hi Swift community, I'd like to propose another addition to AttributedString building off of some of the previously pitched APIs. I'd like to pitch adding new APIs that will allow developers to easily track important indices across mutations to an AttributedString. Please let me know if you have any thoughts, suggestions, or concerns and I'd love to discuss adding in these new capabilities to support advanced AttributedString uses!


AttributedString Tracking Indices

Introduction

Similar to many other Collection types in Swift, AttributedString uses an opaque index type (AttributedString.Index) to represent locations within the storage of the text. AttributedString uses an opaque type instead of a trivial type like an integer in order to store a more complex representation of the location within the text. Specifically, AttributedString uses its index to store not only a raw UTF-8 offset into the text storage, but also a "path" through the rope structure that backs the AttributedString. This allows AttributedString to quickly find a location within the text storage given an index without performing linear scans of the underlying text on each access or mutation. However, because this opaque index type stores more complex information, it must be handled very carefully and kept in-sync with the AttributedString itself. Unlike an integer offset (which can often still be a valid index into a collection like Array after a mutation even if it points to a different semantic location), AttributedString.Index currently makes no guarantees about its validity after any mutation to the AttributedString and in many cases will (intentionally) crash when used improperly. As AttributedString is adopted in more advanced use cases throughout our platforms, we'd like to improve upon this developer experience for some common use cases of stored AttributedString.Indexs.

Motivation

In many cases, developers may wish to use these index types to store "pointers" to locations in the AttributedString separately from the text itself. For example, a text editor that uses an AttributedString as its underlying storage would likely want to store a RangeSet<AttributedString.Index> at the view or view model layer to represent a user's selection in the text while still allowing mutations of the text. Alternatively, complex, in-place, mutating operations that process an AttributedString in chunks may wish to temporarily store an AttributedString.Index representing the current processing location while it performs mutations on the text. In these scenarios, it is currently challenging (or in some cases not possible) to keep an opaque AttributedString.Index in-sync with a separate AttributedString while mutations are occuring since every mutation invalidates every previously produced index value. With more complex AttributedString-based APIs, it's important that we provide a mechanism for developers to keep these indices not only valid to prevent unexpected applications crashes but also correctly positioned to ensure they achieve expected end user behavior.

Proposed solution

To accomplish this goal, we will provide a few new APIs to make AttributedString index management and synchronization easy to use. First, we will propose a new API that allows AttributedString to update an index, range, or list of ranges while a mutation is being performed to ensure the indices remain valid and correct post-mutation. Developers will use a new proposed transform(updating:_:) API to do so like the following:

var attrStr = AttributedString("The quick brown fox jumped over the lazy dog")
guard let rangeOfJumped = attrStr.range(of: "jumped") else { ... }
let updatedRangeOfJumped = attrStr.transform(updating: rangeOfJumped) {
    $0.insert("Wow!", at: $0.startIndex)
}

if let updatedRangeOfJumped {
    print(attrStr[updatedRangeOfJumped]) // "jumped"
}

Note that in the above sample code, the returned Range references the range of "jumped" which is valid for use with the mutated attrStr (it will not crash) and it locates the same text - it does not represent the range of "fox ju" (a range offset by the 4 characters that were inserted at the beginning of the string).

Additionally, we will provide a set of APIs and guarantees to reduce the frequency of crashes caused by invalid indices and allow for dynamically determining whether an index has become out-of-sync with a given AttributedString. For example:

var attrStr = AttributedString("The quick brown fox jumped over the lazy dog")
guard var rangeOfJumped = attrStr.range(of: "jumped") else { ... }

// ... some additional processing ...

guard rangeOfJumped.isValid(within: attrStr) else {
    // A mutation has ocurred without correctly updating `rangeOfJumped` - we should not use this range as it may crash or represent the wrong location
}
// `rangeOfJumped` has been correctly kept in-sync with `attrStr` as it has changed, we can use it freely

For the detailed design of these APIs and a wide variety of considerations and rejected alternatives, please check out the full proposal on the swift-foundation repo PR.

1 Like

I guess I don't understand why this functionality wouldn't be pitched initially for String rather than AttributedString.

That's a fair question. I don't disagree that an API like this could also be beneficial on a type like String so it's possible there is some room for improvement there. However I do feel that it's ok to add this to AttributedString even if it doesn't exist on String yet, mainly because I feel String and AttributedString serve slightly different use cases (beyond just the addition of attribute values) and AttributedString's use case lends itself more-so to this API.

String is designed for small to medium amounts of text (it uses a flat inline buffer for very small strings or a heap allocated flat buffer for longer text). This makes String suitable for displaying labels and pieces of text within apps or passing around bits of textual information. AttributedString on the other hand is not only attributed, but is designed for large scale uses of text and for backing text editors in particular. Rather than using a flat buffer, AttributedString uses a tree-like rope structure that makes it more suited for large and/or edited text because copies are done piecewise on sections of the tree and traversals to specific points within the text are logarithmic rather than linear w.r.t. runtime performance.

While I see a handful of use cases, the tracking index APIs are designed with text editors in mind in specific (for example, a user's selection in a text editor which needs to be kept in sync while the text changes). Since AttributedString is the most suitable API to use to back a text editor, I think this index tracking API has a much higher importance for AttributedString today than it does for String.

I don't think anything here disqualifies us from adding similar APIs to String in the future (they'd need to be independent anyways since the implementation of how indices are tracked is very type-specific). But I think the reasoning we can start with AttributedString is that AttributedString aligns closer to the use cases for this API than String does today.