Additional String Processing APIs

Looking at NSString.replacingOccurrences(of: with: options: range:):

  • The last parameter restricts what part of the entire string to look at. We can replace that in a general RangeReplaceableCollection version by working with a sub-sequence.
  • We still need the first two parameters for the searched-for and replacement sequences.
  • We will add a parameter for a custom equivalence predicate. For an RRC that has an Equatable element type, there will be an overload that omits the predicate parameter.
  • Looking at the various flags of NSString.CompareOptions, many of them (caseInsensitive, literal, diacriticInsensitive, widthInsensitive, and forcedOrdering) can be omitted by a common normalization step for the source and search-for collections and/or use of a customized predicate.
  • The backwards and anchored flags will be moved to be function parameters.
  • (regularExpression is too much to handle right now. And numeric doesn't apply since it's an ordering, not equivalence, directive.)

Initially I saw that using an in-place replacement method would have the problem that the first replacement will void all the indices, screwing up subsequent searches. So I decided to ditch an in-place method, and make search-and-replace return a new copy. This changes the general requirements from RRC to a plain Collection. Then I decided to separate search-and-replace into separate searching and replacing methods. My prototype for the replacing method turned out to be:

extension Collection {
    /// Returns a copy of this collection into a collection of the given type,
    /// but where the subsequences of this collection indicated by the given
    /// list of index ranges are replaced by mapped versions of themselves, in
    /// the given direction, using the given mapping function.
    public func replaceSubranges<S, T, U>(
        listedBy list: S, backwards: Bool,  into type: T.Type, withResultsOf transform: (SubSequence) throws -> U
    ) rethrows -> T
        where S: Sequence, S.Element: RangeExpression, S.Element.Bound == Index, T: RangeReplaceableCollection, T.Element == Element, U: Sequence, U.Element == Element

    /// Copies this collection into an array, except that subsequences of this
    /// collection indicated by the given list of index ranges are replaced in
    /// the given direction by mapped versions of themselves, using the given
    /// mapping function.
    @inlinable
    public func replaceSubranges<S: Sequence, U: Sequence>(
        listedBy list: S, backwards: Bool, withResultsOf transform: (SubSequence) throws -> U
    ) rethrows -> [Element]
        where S.Element: RangeExpression, S.Element.Bound == Index, U.Element == Element
}

And the searching can be expressed as a sequence of ranges, packed into a new type:

/// A sequence of the locations of all the occurrances a substring appears in
/// some collection according to an element-equivalence function.
///
/// - Warning: `startIndex` is computed on the fly, making it O(*n*) instead of
///   O(1), where *n* is the length of the base collection.  Since collection
///   iteration repeatedly tests the last-reachable end-point, prefer forward
///   iteration when possible, since `endIndex` is a lot quicker to compute.
public struct MatchingRangesCollection<Base: Collection, TargetElement> { /*...*/ }
extension MatchingRangesCollection: LazyCollectionProtocol { /*...*/ }
extension MatchingRangesCollection.Index: Hashable where Base.Index: Hashable {}
extension MatchingRangesCollection: BidirectionalCollection where Base: BidirectionalCollection { /*...*/ }

extension Collection {
    /// Returns the index ranges for each subsequence of this collection that is
    /// equivalent to the given sequence, using the given predicate to compare
    /// elements.
    public func matchingRanges<S: Sequence>(for target: S, by areEquivalent: @escaping (Element, S.Element) -> Bool) -> MatchingRangesCollection<Self, S.Element>}
}

extension Collection where Element: Equatable {
    /// Returns the index ranges for each subsequence of this collection that is
    /// equal to the given sequence.
    @inlinable
    public func matchingRanges<S: Sequence>(for target: S) -> MatchingRangesCollection<Self, Element> where S.Element == Element
}

Then I tried to write a convenience method on Collection to search-and-replace by combining the matching-range collection with the replacement method. But I realized the method didn't need to accommodate non-local code-blocks, and couldn't because that would forbid throwing predicates. So I started over.

I moved the main searching code to some helper functions:

extension Collection {
    /// Returns the index range for the earliest subsequence of this collection
    /// that is equivalent to the given sequence, using the given predicate to
    /// compare elements.
    public func firstRange<C: Collection>(equaling target: C, by areEquivalent: (Element, C.Element) throws -> Bool) rethrows -> Range<Index>?
}

extension Collection where Element: Equatable {
    /// Returns the index range for the earliest subsequence of this collection
    /// that equals the given sequence.
    @inlinable
    public func firstRange<C: Collection>(equaling target: C) -> Range<Index>? where C.Element == Element
}

extension BidirectionalCollection {
    /// Returns the index range for the last subsequence of this collection that
    /// is equivalent to the given sequence, using the given predicate to
    /// compare elements.
    @inlinable
    public func lastRange<C: BidirectionalCollection>(equaling target: C, by areEquivalent: (Element, C.Element) throws -> Bool) rethrows -> Range<Index>?
}

extension BidirectionalCollection where Element: Equatable {
    /// Returns the index range for the last subsequence of this collection that
    /// equals the given sequence.
    @inlinable
    public func lastRange<C: BidirectionalCollection>(equaling target: C) -> Range<Index>? where C.Element == Element
}

Then wrote a search-and-replace method with an unconditional replacement value:

extension Collection {
    /// Returns a copy of this collection into a collection of the given type,
    /// except subsequences that match the given collection according to the
    /// given predicate are replaced with copies of another given collection.
    public func replacingOccurrences<T, R, RR>(
        of target: T, with replacement: R, onlyAsPrefix: Bool, into type: RR.Type, by areEquivalent: (Element, T.Element) throws -> Bool
    ) rethrows -> RR
        where T: Collection, R: Collection, R.Element == Element, RR: RangeReplaceableCollection, RR.Element == Element

    /// Copies this collection into an array, except subsequences that match the
    /// given collection according to the given predicate are replaced with
    /// copies of another given collection.
    @inlinable
    public func replacingOccurrences<T: Collection, R: Collection>(of target: T, with replacement: R, onlyAsPrefix: Bool, by areEquivalent: (Element, T.Element) throws -> Bool) rethrows -> [Element] where R.Element == Element
}

And expanded it for bidirectional collections where the replacement could be done forwards or backwards. (This decides which of a given pair of overlapping matches is replaced.):

extension BidirectionalCollection {
    /// Returns a copy of this collection into a collection of the given type,
    /// except subsequences that match the given collection according to the
    /// given predicate are replaced in a given order with copies of another
    /// given collection.
    @inlinable
    public func replacingOccurrences<T, R, RR>(
        of target: T, with replacement: R, backwards: Bool, onlyAtLeadingAdfix: Bool, into type: RR.Type, by areEquivalent: (Element, T.Element) throws -> Bool
    ) rethrows -> RR
        where T: BidirectionalCollection, R: BidirectionalCollection, R.Element == Element, RR: RangeReplaceableCollection, RR.Element == Element

    /// Copies this collection into an array, except subsequences that match the
    /// given collection according to the given predicate are replaced in a
    /// given order with copies of another given collection.
    @inlinable
    public func replacingOccurrences<T, R>(
        of target: T, with replacement: R, backwards: Bool, onlyAtLeadingAdfix: Bool, by areEquivalent: (Element, T.Element) throws -> Bool
    ) rethrows -> [Element]
        where T: BidirectionalCollection, R: BidirectionalCollection, R.Element == Element
}

Then I made overloads for when the element type is Equatable. (But they lock the return type as an Array.) Then I made overloads for RangeReplaceableCollection where Self is returned, along with overloads for BidirectionalCollection conformance and/or element-Equatable conformance.

Since my first idea for separate search and replace functionality failed, I did make in-place overloads too. They just call the separate-object versions then replace themselves.

However, with the separate functionality, we could make a regular-expression range-matching reference type. An instance of that type, initialized with a regex pattern and a string, can be fed to the replace routine as the range list and as part of the transformation closure. In the transformation closure, it can read the indices of the current targeted subsequence and apply regex rules for ordinal matches.

You can see them all in a gist.

2 Likes