Naming Discussion: `sortedPrefix(_:)` Edition

As noted in my last post, before tagging a new release of the Algorithms package I'd like to have a bit of discussion about naming of some of the new additions. In this thread I'd like to invite comments about the sortedPrefix(_:) method, which you can read about in the Sorted Prefix guide.

This method has been challenging to name! It is related to (or at least sounds related to), but is not the same as:

  • a sorted version of the prefix, e.g. seq.prefix(10).sorted()
  • partialSort from C++, which operates in-place and leaves the rest of the collection in indeterminate order.
  • the min() method that already exists on sequences, which returns the smallest element — but if we were to call this min(_ count: Int), then would we also need a max(_ count: Int)?
  • "prefix orders", the set theory term

These are the names that were discussed during the addition process:

_ = numbers.sortedPrefix(5)
_ = numbers.smallest(5)
_ = numbers.partiallySorted(5)
_ = numbers.sort(upTo:5)
_ = numbers.sortFirst(5)
_ = numbers.min(5)

I'm sold on sortedPrefix(_:) as the best name of those proposed so far, since it's essentially a more efficient version of sorted().prefix(_:). I'd love to hear peoples' thoughts and other suggestions of names that we haven't come across yet.

Many thanks to @rockbruno and @rafaelmachado for the addition of sortedPrefix(_:)!

5 Likes

I think partialSort does the best job at representing what it does, but it's really hard to describe that in Swift without going against our naming conventions... There is a little margin for error when reading sortedPrefix but I also agree that it's the best we have so far

sort, partialSort, and sortFirst sound like commands and I would expect them to be in-place.
partiallySorted, sortedPrefix, smallest, .. sound like functions and I would expect them to return something.

Does this interpretation make sense?

Thinking about how I would use it in practice, what about myArray.firstSorted(10)?

2 Likes

Similar to @SlaunchaMan's suggestion, I'd try to keep it as close as possible to the existing .sorted() call-site syntax:

let sortedPrefix = myArray.sorted(first: 10, by: ...)

It's natural to me to reach for .sort... when wanting a sorted array, so invoking autocomplete with that typed would reveal .sorted(first:by:) next to .sorted(by:)

6 Likes

There should also be a min(_, by:) and max(_, by:)

min(_ count: Int, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows -> Element

to avoid numbers.sortedPrefix(5).first

right?

Otherwise, sortedPrefix sounds fine.

2 Likes

To me, seq.sortedPrefix(10) reads like it’s the equivalent of seq.prefix(10).sorted(), not of seq.sorted().prefix(10).
It may be just a matter of getting used to it (particularly since one quickly realizes it would not make much sense to have a function dedicated to doing something as simple as prefix().sorted()).

But I like @davedelong’s suggestion of making this a variant of .sorted(). Just not a big fan of using first: for the size argument. Maybe .sorted(limitTo:by:) or something like that?

1 Like

I understand the confusion, and I can see the rationale — especially the fact that mirroring the more expensive version naming helps discoverability in auto-complete — I do think it can still be confusing.

To me this looks more like what I'd want seq.lazy.sorted().prefix(10) to do (and I think doesn't). And I also understand that we cannot really split the implementation in having a LazilySortedSequence kind of thing because that really has to do bubble sort to work? Or maybe it can have some access to say "batch next x elements" so that it can decide how to optimise like this version already does.

And even assuming we can do this... I think we cannot change the existing behaviour from the stdlib :disappointed:

though given this discussion maybe lazilySortedPrefix(_:) is clearer? :thinking:

Hi Nate,

My two cents on this is that sortedPrefix() seems indeed the best option here. smallest(5) seems good too but it loses meaning when it takes an isIncreasingOrder as a parameter.
One thing I'm not sure was discussed (sorry if it was and I miss it) is a more general API that supports not only prefixes but operates on ranges like

extension Collection {
  func sortedSubrange<R: RangeExpression>(_ range: R) -> [Element] where R.Bound == Self.Index {
   // ... 
  }
}

// This is how it could look on call
_ = numbers.sortedSubrange(...5)

Maybe specialized versions for PartialRangeUpTo and PartialRangeThrough as they could have implementation optimized the same way as the prefix one.

This could also address a possible sortedSuffix(5), although a sorted suffix would be just an inverted order sorting prefix reverted...
Maybe this range API is outside of the scope for this naming discussion, but I think if it wasn't already discussed could be relevant :)

In summary, I'm +1 on sortedPrefix()

I'm not sure about sortedPrefix. Reading just the name first, not the parameter list or docs, I expected it to return a prefix up to and excluding the first element that is smaller than its predecessor. In other words the longest prefix of the collection that is already in sorted order.

Seeing some unlabeled integer in the parameter list I can guess it must return something of that count, but still leaves me with no idea whether a prefix of n elements is taken and then sorted, or whether it's a prefix of the sorted version of the whole collection.

I think being a bit more verbose would do good here, the first thing that popped into my head was prefixAfterSorting, which could still be interpreted the other way round, but I think it nudges it in the right direction just enough. I wouldn't be opposed to maybe throwing in a count label either.

1 Like

The name should be least. least is the plural form of min, when based on "<". (i.e. a more precise form of “smallest”. This adjectival form: https://en.wiktionary.org/wiki/least_weasel)

Yes, except, greatest.


It's not going to most frequently happen that people will want this on a Sequence of Comparables. That's a special case. The most common use case is to dig in for a Comparable, with a key path.

func least<Comparable: Swift.Comparable>(
  _ count: Int,
  by comparable: (Element) throws -> Comparable
) rethrows -> [Element] {
2 Likes

Thanks so much to everyone who contributed to this discussion! Based on reading the comments here and some outside discussions, I'm convinced that there's too much room for interpreting sortedPrefix as its literal meaning, a sorted version of a collection's prefix.

That leaves us with the family related to the existing min and max methods, with suggestions like min, minimumElements, smallest, and least. I find I react to the names differently when thinking about quantifiable values like numbers than when thinking about other types.

let names = ["Jin", "RM", "J-Hope", "Jungkook", "Jimin", "V", "Suga"]

names.min(3)
names.minimumElements(3)
names.smallest(3)
names.least(3)

To me, least and smallest make less sense in this context, which narrows down to extending the existing naming near min. Just adding the count alone as an argument can read as performing a "minimum" operation between the argument and the elements of the collection (e.g. [5, 6, 7].min(2) == 2?), so I've landed on including a count parameter label. A matching max(count:) method will return the elements that would be at the end of the sorted collection.

let numbers = (1...100).shuffled()

numbers.min(count: 5)   // [1, 2, 3, 4, 5]
numbers.max(count: 5)   // [96, 97, 98, 99, 100]

Happy to hear any additional thoughts on these names!

5 Likes

I like the min(count: 4) naming. If the collection has less than count elements, does it just return as many as it can? (Similar to the prefix(_ count:) methods)

I don’t think this is worth worsening the spelling.

We already have [5, 6, 7].prefix(2), and nobody worries that it might be misinterpreted as prefixing the element 2 and thus returning [2, 5, 6, 7]. It’s simply not a concern.

Notably there is a type mismatch between the function and the misinterpretation, for both prefix(n) and min(n):

prefix(n) returns SubSequence, whereas the misintepretation would return Array (or maybe Self for RRC’s).

min(n) returns Array, whereas the misinterpretation would return Element.

Thus, anyone who might entertain such an idea would have it dispelled immediately, because the types are different.

For simplicity, for ease of reading and writing, and for consistency with prefix(n), I think it is a better long-term decision to choose the spelling min(n).

Right, just like prefix(_:), suffix(_:), and randomSample(count:).

1 Like

I'd like to hear what are your thoughts on the version taking a comparison function. Would that be included as min(count:by:) and max(count:by:)?

I thought about it before, and even with the new names, sortedPrefix makes the most sense to me. min and max don't really make sense here since there are a lot of sorts where that terminology wouldn't make sense.

The standard library already has min(by:) which works for arbitrary sorting predicates, so I think that decision has been made.

3 Likes

Right, and the naming does not lend itself to universal usage. Every time I use that method I have to think about what min means for the predicate I'm actually using. If it's not a numeric calculation it may take some thought.

I really like the the idea of using the min and max terminology a lot.


Although the original name might not sound the most natural in some scenarios, naming consistency here is far better than using two different names for the same concept. That would leave us with two names that aren’t perfect.

1 Like
Terms of Service

Privacy Policy

Cookie Policy