Median

Median

Introduction

This proposal would add global median functions to the standard library.

Motivation

A comparable value can be clamped to a closed range, by combining the existing min and max functions.

// Values clamped to `0.0...1.0`
max(0.0, min(-.pi, 1.0))  //-> 0.0
max(0.0, min(+.pi, 1.0))  //-> 1.0

Proposed solution

The previous example can be expressed more easily as:

// Values clamped to `0.0...1.0`
median(0.0, -.pi, 1.0)  //-> 0.0
median(0.0, +.pi, 1.0)  //-> 1.0

Detailed design

public func median<T: Comparable>   (_ x: T, _ y: T, _ z: T) -> T
public func median<T: FloatingPoint>(_ x: T, _ y: T, _ z: T) -> T
public func median<T: FloatingPoint>(_ x: T, _ y: T, _ rest: T...) -> T
  • The functions without a variadic parameter are optimized for clamping.
  • The values can be given in any order, and will be sorted in ascending order.
  • Floating-point values, including signed zeros and NaNs, will be totally ordered.
  • The middle value, or the arithmetic mean of two middle values, will be returned.
median(1.0, 2.0)                  //-> 1.5
median(1.0, 2.0, 4.0)             //-> 2.0
median(1.0, 2.0, 4.0, 8.0)        //-> 3.0
median(1.0, 2.0, 4.0, 8.0, 16.0)  //-> 4.0

Acknowledgments

The idea for this proposal was suggested by Steve Canon.

5 Likes

Seems like a straightforward win. I'm not qualified to comment on whether there are any subtle numerical issues the implementation will need to consider, but Steve's already in the loop so I have no worries there.

It seems like the N-parameter form might be better expressed as a method on something in the Collection hierarchy, much like how we have both the top-level function max() and the extension method max().

I also think we may want to perform this operation on integer types in addition to floating-point types. Is there something more general than FloatingPoint that we could require instead?

5 Likes

I like the 3-element median, but I think the variadic version would feel more natural as a method on Collection, with both Comparable and custom-predicate versions.

Edit: Jinx!

5 Likes

FWIW, I don’t think the variadic form could work on Comparable because of its behavior when there are an even number of elements.

I think the difficulty is what to do when there’s an even number of elements.

What’s the median of 0 and 1 as integers?

Edit: Jinx again!

You’re right, what I meant was, for floating-point values, I sometimes want the median by \.magnitude or suchlike, rather than direct <.

0.5

I think the median for Integer type should be a Double.

/ is separately defined in BinaryInteger and FloatingPoint. So median on collection need to be separately define on Integer type and FloatingPoint type.

The global min and max functions have an overload with a variadic parameter, so it seemed natural to use a similar design in this proposal. Also, the median of two values (i.e. an empty variadic array) returns the arithmetic mean, which may be useful as a global function?

The most general requirements I tried were:

  • T: Strideable where T.Stride: FloatingPoint
    but that wouldn't include any additional types, except maybe Foundation.Date — if a conformance was added (FB9452917).

  • T: Strideable where T.Stride: ExpressibleByFloatLiteral
    but that would include Foundation.Decimal, which had a broken distance(to:) method for the last five years — only recently fixed by @xwu in SR-6785.

Also, the distance(to:) method would return infinity for some floating-point values.

I agree that median methods on sequences would be useful, but I didn't think they would be accepted for the standard library. There was a previous discussion on adding sum and product methods to Swift Numerics, so perhaps mean and median methods would also be considered for that package?

Sorting by key paths was recently discussed in the apple/swift-algorithms#89 issue.

The benchmark suite returns an Int median for an [Int] array, by rounding to the nearest index.

I’m not sure how I feel about introducing floating-point versions that use isTotallyOrdered(belowOrEqualTo:).

On the one hand, it’s a perfectly sensible behavior.

On the other hand, there’s no corresponding versions of min and max.

Should we add them?

• • •

Conversely, if one takes the median of some numerical data, it is also perfectly sensible to expect that any non-numerical values would simply be ignored, skipped over, and not included in the calculation. In other words, one might hope to extract the median of the non-NaN values.

After all, if the .nan values all get shunted to one end of the list, that could substantially skew the perceived median.

The only motivation given in the pitch is for clamping to a range, for which I think the ideal solution would be a method with “clamp” in the name that takes a range type, probably ClosedRange. I don't think people are going to intuitively reach for a global median function for that use case. There are independent uses for finding the median of three (or more) items or numbers (e.g. when there is no clear notion of which of the three is the upper and lower bound), but they would be better motivated by another use case. Or you could include both a clamp function and a global median function in the same proposal.

8 Likes

I think the functions should be under FloatingPoint itself, like FloatingPoint.minimum(_:_:) and FloatingPoint.maximum(_:_:).

I also think it would be a good idea to have a separate clamped(to:) method as an extension to the Comparable protocol, like what @jawbroken suggested.

I don’t think it’s a good idea to apply this to integers — the rounding (or Double-converting) behavior may be unexpected to users. If a programmer really needed a function like this, then they could make it themselves.

I also don’t think it’s a good idea to use variadic parameters here — this function would be useful for any collection of floating-point numbers.

If we add a median function, we should probably add functions like arithmeticMean, mode, range, and maybe geometricMean. We might also want to consider making numerical computing more of a priority in general (do people still like the idea of a numerical/ML working group?). I think Swift has a lot of potential for numerical computing, but it doesn’t yet reach its potential. (The standard library doesn’t even have a sin or log function!)

Tangential:

Swift Numerics, however, does (for both real and complex types). Not everything has to go in the standard library; you have to include a header or library in most languages to use sin and log. It is at least not obvious that Swift should be different.

2 Likes

I might remove the two median<T: FloatingPoint> functions from the proposal, or at least mention the possibility in an Alternatives Considered section.


My original implementation didn't use a total ordering, so it only documented using a stable sorting algorithm. Swift 5.0+ has a stable sort (apple/swift#19717), but this isn't guaranteed yet, and would need another proposal.

TabularData.NumericSummary.median also ignores "missing" elements, but I think that refers to Optional.none rather than NaN.

The idea of a global median function was suggested last year by @scanon (and endorsed by @kylemacomber). I'd like to keep the proposal short, so I'm reluctant to expand the Motivation section.

That would be the SE-0177 proposal, which was returned for revision (4 years ago), and revisited a few times since then.

Sure, if you like, but if that's the only motivation then I don't think this is a good solution for it, so I have to conclude that it shouldn't be added. I'll also note that the link you give here suggests that the median function would be in addition to a clamp function:

1 Like

I still think it would be valuable to have a ML working group. We continue to do Swift auto-diff compiler work in our team and are building AI frameworks. The whole tech stack at our company is embedded Swift AI. It's the right solution for what I think of as "AI for Grownups" in a typed system backed by a compiler. We'll be spinning out a non-profit to hold some of the assets over the next year.

We would be suppport a ML working group.

I think discussing SE-0177 is appropriate here since the median functions without a variadic parameter are essentially just clamping functions. I think using a clamped method is a more expressive way to solve this pitch’s motivating problem. And it would potentially be even more expressive if we add a mutating variant as well as support for partial ranges.

Based on this post, it seems that the reason the original pitch was returned for revision is (a) because the implementation wasn’t compatible with the latest version of Swift at the time, (b) the implementation wasn’t generic over RangeExpression, and (c) there were concerns about the proposal not meeting the bar for being included in the standard library. However, I don’t think these reasons really apply today since (a) we can easily create a version of these functions that works in the latest version of Swift. Also, (b) now that Swift’s API is stable, it’s more obvious that a generic implementation over RangeExpression isn’t a good way to represent these functions. Finally, (c) since Swift’s API is more stable than it was 4 years ago, the bar for new additions to the standard library has been lowered because we don’t have to worry about refactoring after a dramatic change in the language’s design.

I think that having clamped functions would help improve readability of code (even more than median) without any significant costs. Based on this forum thread, it seems that there are plenty of others who like this design — it just needs a more formal pitch.


I do think that having a separate median function for numbers would be useful, but it should be meant for the purpose of statistical analysis, not clamping a value. We shouldn’t have a function that has two separate purposes. I think it’s up for debate whether we should put it in the standard library or in the Swift Numerics package, but it’s something we should have.


Tangential discussion about elementary functions in the standard library

I think that functions like sine and logarithm are important enough that they should be in the standard library, and not stay limited to Swift Numerics.

The case for mathematical functions in the standard library:

  • According to the note at the top of the SE-0246 proposal page, the reason that these functions weren’t put in the standard library is because of compiler limitations, and the note seems to imply that it will be implemented in the standard library when it’s feasible.
  • Requiring users to use a separate package for this functionality means that users will have to learn how to use packages and the package manager and all of its associated complexity. Math (with + / - / * / /) is one of the first things people learn how to do in Swift, so this complexity is undesirable. It’s also less discoverable, and I believe that this lack of discoverability leads people to simply use the C functions — if you search the web for “Swift sine” or “Swift logarithm”, you’ll find that every result suggests using C functions, not Swift Numerics. I believe this is bad because the C functions aren’t generic (so there are separate functions to handle Float) and because I believe Swift should be taught in a way that doesn’t require the programmer falling back on C.
  • IMO we should only keep functionality within Swift Numerics if it’s only helpful within the domain of numerical computing. Simple mathematical functions like sine are used beyond numerical computing (i.e. game design and rendering).
  • Just because other languages require an import for some functionality doesn’t mean Swift should require one. For example, C++ and Java require an import for variable-sized arrays (<vector> and java.util.ArrayList), but Swift doesn’t. Swift doesn’t usually force programmers to import a bunch of modules for basic functionality. As another example, importing Darwin / Glibc imports every C header (including math.h!) instead of Swift requiring the programmer to import everything individually.

I think functionality like complex numbers and variable-sized numbers are specialized enough that they should stay in Swift Numerics, but simple mathematical functions should go into the standard library

5 Likes

apple/swift-evolution#1450:

The core team did a preflight discussion of this proposal to determine if this should move to the review stage. That discussion raised some concerns:

  • The proposal adds a function that takes precisely three arguments. While the semantics of median is evident in this case (i.e., there always is a "center" value), there was general skepticism that such a function would add much value to the Standard Library.

  • A more general API — that takes (say) a variable number of values — might have higher utility, but also begets the question of how to define the median for different types. In a statistical context, the median of an even number of values is the average of the two central values. In many programming contexts, however, it is important that the returned value be one of the original elements. The latter definition can also be applied to collections of non-numerical types. For this to be available on non-numeric types, there would need to be a well-defined answer. For example, what should the result of ["a","b","c","d"].median be?

  • The core team wondered if this function would better fit with a family of numerical or statistical APIs. Such packaging of the API would better capture the use case for median and provide precise semantics for the odd and even number of values cases.

Based on these concerns, the core team does not believe the proposal should move forward with review. Thank you for taking the time to draft it up.

3 Likes

Hmm I can see the use case for this but the concerns that the core team pointed out by @benrimmington seem reasonable.