Storing concatenated `Substring`s in a single `String` buffer

i’m trying to store (percent-decoded) URL path components efficiently in a lookup table. the expected number of path components per-entry is around 5–20ish, and they can contain the path separator "/".

since storing a [String] array would likely result in heap fragmentation, i thought about concatenating them together into a single String, and using a delimiter character like "\u{0}" to separate the substrings, but this thread about string slices and mixed indices has scared me off of this approach, since the percent-decoding is performed at the UTF-8 level, and the path components themselves have some post-processing applied at the Unicode.Scalar level.

it is not necessary to efficiently re-construct the original [String] array, i’m just worried about the meaning of the substrings changing based on data adjacent to each path component, like in the linked example.

has anyone else run into this problem?

The worry about heap fragmentation might be preliminary, that approach may work just fine.

Alternatively you may completely parse URL components into a full string (along with all post processing performed), and store [ComponentKey : Range<String.Index>] plus the string, or [ComponentKey : (String, Range<String.Index>)]. That's if you don't need mutating the components (if you do, it's getting more complicated).

1 Like

previously i was using [[UInt8]] (storing UTF-8), which did cause horrific heap fragmentation. i’m not sure if [String] would have the same problems though, since String benefits from small-string optimizations, so that’s a good point.

i don’t need to mutate the components, so we’re good on that front. however, the linked thread is exactly about Substring weirdness being caused by data outside of the Substring bounds. so i’m not sure how switching from Substring to a Range<String.Index>-based representation would be immune to that.