New LANumerics package for numerical linear algebra in Swift

Essentially, each memory region is bound to a single type at any time. If you're just passing pointer to a valid object U, it'd already be bound to U (and not V) at that time.

assumingMemoryBound(to:) assumes that it is already bound to V. From the Discussion section, violating that assumption results in an undefined behaviour:

The memory starting at this pointer must be bound to the type T . Accessing memory through the returned pointer is undefined if the memory has not been bound to T .

2 Likes

In principle you are right, but in the situations where I am using it it's fine. I am using recast<U,V> only when I know that U and V are the same, but the compiler is not smart enough to figure it out, or when I know that U and V have compatible layouts, like when U is complex, and V is its RealType. So in fact, the memory is already bound to V. Furthermore, my access to the recast pointer doesn't live longer than my access to the original pointer.

IIRC, compatible layout is not enough for assumeMemoryLayout, it’d also need to be related type, In Swift, this rule is much stricter, so things like UInt32 and Int32 are not related (yet), only layout compatible.

Paint me curious. It’s intriguing that this could happen. I’d probably dig deeper should I have time.

That’d be related to object lifetime, which I’m not too worried about. I believe someone w/ your caliber should be already familiar with the fact that pointers don’t contribute to lifetime management (and that you can only access live objects).

I am glad that life time management of pointees is a trivial issue for you. So you will also see that casting in the way I do it is fine, otherwise it wouldn't be possible to call C-functions with array arguments as UnsafePointers ;-)

Unfortunately Swift doesn't allow conditional conformance to protocols. Therefore, in the Complex implementation of LANumeric, I need to test the type of RealType dynamically. The compiler does not understand this dynamic check with regards to type inference purposes, thus it does not know that RealType is really Float and Double in the respective parts of the code. Ideally, the compiler will optimise away any use of recast in the code as recast doesn't do anything except make the compiler happy.

1 Like

What do you mean by this?

I mean that I would like to do something like the following:

extension Complex where RealType == Float {
  func doStuff(x : RealType) -> RealType {
    ....
  }
}

extension Complex where RealType == Double {
  func doStuff(x : RealType) -> RealType {
    ....
  }
}

But it seems I cannot, so instead I do:

extension Complex {
  func doStuff(x : RealType) -> RealType {
    if RealType.self == Float.self {
      ...
    } else if RealType.self == Double.self {
      ...
    } else {
      fatalError()
    }
  }
}

There's a few ways to do this without shenanigans:

extension Complex {
  func doStuff(x : RealType) -> RealType {
    if let f = x as? Float {
      // use f here, it has type Float
    }
  }
}

Another option that requires a bit more work, but can also handle some more sophisticated use cases, is:

protocol CanDoStuff: Real {
  static func doStuff(z: Complex<Self>, x: Self) -> Self
}

extension Float: CanDoStuff {
  // ...
}

extension Double: CanDoStuff {
  // ...
}

extension Complex where RealType: CanDoStuff {
  // You can use RealType.doStuff here unconditionally
}
1 Like

I thought about doing it like that, it has the disadvantage though that I have to duplicate everything in CanDoStuff, one version for RealType, and one for Complex<RealType>. But in the end this might be the cleaner and more efficient solution.

My experience is that for anything sufficiently complex, it's the cleaner solution (but: do try to trim the requirements down to what you really need, or it gets burdensome).

After thinking a bit more about it, it's really not clear here what the cleaner solution is. It is pretty clean currently, the "shenanigans" are encapsulated in one place via

extension Complex {
    
    static func dispatch<R>(float : () -> R, double : () -> R) -> R {
        if RealType.self == Float.self {
            return float()
        } else if RealType.self == Double.self {
            return double()
        } else {
            fatalError("cannot dispatch on Complex.RealType == \(RealType.self)")
        }
    }

}

This solution is closest to what one would naturally do if Swift's type system would be more powerful, especially as the code passed in float and double function parameters is short.

The most important argument against this would be if the dynamic type check proves too costly compared to the other solution. The other solution would be to have a protocol CanDoStuff which is implemented by Float, Double and Complex, and on top of that a protocol CanDoStuffReal, which is an extension of CanDoStuff and implemented by Float and Double. CanDoStuffReal is used in the implementation of CanDoStuff of Complex. As CanDoStuff consists of 28 functions currently (on top of which other generic functions are built), that's somewhat messy.

I guess at some point I should try out both solutions and compare their performance against each other and see if a switch to the other solution would be advisable.

And I guess the other solution also wins if more RealTypes than just Float and Double come into play.

I encountered that problem in an even worse situation.

Obviously, you can do matrix arithmetics with Double, Float, Complex and Complex. But how would you even go about putting the appropriate BLAS methods into a protocol fulfilled by Double, Float and so on?

A solution to that problem is to instead have protocols for matrix arithmetics and instead of conditionally conforming a generic matrix type to this protocol having concrete matrix types conforming to the protocol.

You can have a generic matrix type Matrix implementing some general matrix protocol that allows casting to concrete matrix types, i.e. Matrix can easily be cast to the concrete MatrixD and so on. MatrixD conforms to the arithmetic protocol and whatever other functionalities you want to have.

The protocol requires only the basic functionality of the BLAS or whatever matrix functionality your library wants to provide. Operators and functionality that can be derived from the protocol methods go to protocol extensions - maybe with a corresponding protocol requirement to allow customization by implementing types.

Useful combinations of features, like arithmetics, pointwise operations and so on, get their own type alias:

public typealias RealMatrix = ArithmeticMatrix & PointwiseMatrix & DecomposableMatrix &...

Also, I would strongly recommend creating wrappres for vectors rather than just a type alias for arrays. This is the only way you can do operator overloading for + without name conflicts.

Regarding lazy types, i.e.

Great idea, but as you increase the amount of functionality in the library, you would have to implement all those functions for each lazy type. Or you would end up with types that just can't do everything that matrices should be able to do. Two suggestions:

  1. Have those lazy types as an opt-in feature like LazySequence. You would have to type matrix.lazy.transpose()then. The drawback is that operators would default to the eager implementation.
  2. For each lazy type, provide a method to retrieve the equivalient eager value. Possibly, even put this into the general matrix protocol via an associatedtype EagerSelfthat defaults to self with an inlined identity as implementation. With a little discipline, you can then provide neat default implementations for all operations in your library by first converting everything to EagerSelf and specialize the protocols for the lazy types later.

Well, check out my library, it's done there ;-)

Ok, apparently this is possible.
However, regarding the float and double solutions, it looks fine - the Complex solution looks a bit problematic.

There's a lot of casting going on, Complex.dispatch dispatches at runtime and may fatalerror out of you put in some type that your repo doesn't centrally support. For instance, there will soon be a Float16 type and we may see a lot of low level functionality (not in BLAS, but some other library) so it could conform to LANumeric. If a user of your library adds this conformance, they'll automatically see the corresponding methods for the complex type due to your protocol constraint. They may assume that you did some magic tricks here and just use it - and get a crash.

We should really utilize all static information here we can possibly get, not only for reliability and extensibility reasons, but also for speed reasons (static dispatch vs. dynamic dispatch and what the compiler can do for you).

Possibly, combine my above mentioned approach with yours: Have a Complex protocol with associatedtype Real that allows conversion between complex types with agreeing real types, have a concrete ComplexD and ComplexF type and make ComplexD and ComplexF conform to LANumeric. Possibly keep your generic complex type around for potential weak type erasure use cases (whatever these may be).

Or one could go one level deeper and have a protocol LARealType providing the appropriate methods for Complex<T>where T conforms to LARealType. Then, the type constraint when Complex conforms to LANumeric is straightforward.

True. But I think that's ok. Somebody adding Float16 could be expected to go and cover the Complex case after experiencing a crash, given that that would be completely internal to the package.

You could go the way that Steve Canon suggested. It's more typing (no pun intended), but statically checked and probably more along the lines you would want.

I think the proper fix would be to just fix Swift's type system.

@Lantua @phlegmaticprogrammer Just an FYI. @Andrew_Trick gave a talk at WWDC this year on the topic of pointers and "memory state". He explains the various APIs and when to use them (including assumingMemoryBound/etc). Check it out:

2 Likes

Standard apology for people working with C pointer APIs: Ideally, the swift compiler would know you're calling a C function and be smart enough to allow implicit pointer conversion for types that are considered related in C.

The intended way to get around this is by wrapping C calls to withMemoryRebound(to:).

Sorry, but if the compiler doesn't "know the types are the same" then the types aren't the same and this is undefined behavior under most conditions.

I won't get too worked up over a fileprivate API. Presumably it's some sort of localized workaround. It is really disturbing when I see public APIs hiding undefined behavior behind a convenient API.

You could create concrete overloads of recast for the types you think are safe by C's strict aliasing rules and make it clear that the returned value is only allowed to be passed directly to a C function, not accessed by Swift code.

For people who think this either isn't a real issue or is somehow introduced by Swift, here's an example of why Swift needs these type rules just to safely call C APIs:

Building this code with for release (-O) will cause the precondition to trigger because the C compiler has optimized addTenTimes assuming that S1 * and S2 * do not alias:

--- file.h

struct S1 {
  int i1;
} S1;

struct S2 {
  int i2;
} S2;

void addTenTimes(struct S1 *s1, const struct S2 *s2);

--- file.c

void multiplyBy1024(struct S1 *s1, const struct S2 *s2) {
  for (int i = 0; i < 10; ++i)
    s1->i1 += s2->i2;
}

--- file.swift

// Initialize S1.i1.
let s1Ptr = UnsafeMutablePointer<S1>.allocate(capacity: 1)
s1Ptr.pointee.i1 = 1

// Rebind S1 to S2--they are layout-compatible.
let s2Ptr = UnsafeRawPointer(s1Ptr).bindMemory(to: S2.self, capacity: 1)

// Call a C routine with aliased pointers.
multiplyBy1024(s1Ptr, s2Ptr);

precondition(s1Ptr.pointee.i1 == 1024)
4 Likes

Very interesting example. I watched your talk yesterday and left somewhat confused although you state at the end of your talk that things are hopefully clearer now :joy: There are a lot of different pointer types around and it is not clear to me what the exact semantics of what is. Is there a formal semantics somewhere for the pointer API? That might help.

So given the definition of

fileprivate func recast<U, V>(_ ptr : UnsafeMutablePointer<U>) -> UnsafeMutablePointer<V> {
    let p = UnsafeMutableRawPointer(ptr)
    return p.assumingMemoryBound(to: V.self)
}

is for example this code safe or not (where in the Dispatch/float branch Self will be Complex<Float> and in the Dispatch/double branch Self will be Complex<Double>):

    public static func lapack_gesvd(_ jobu : UnsafeMutablePointer<Int8>, _ jobvt : UnsafeMutablePointer<Int8>,
                                    _ m : UnsafeMutablePointer<IntLA>, _ n : UnsafeMutablePointer<IntLA>,
                                    _ a : UnsafeMutablePointer<Self>, _ lda : UnsafeMutablePointer<IntLA>,
                                    _ s : UnsafeMutablePointer<Self.Magnitude>,
                                    _ u : UnsafeMutablePointer<Self>, _ ldu : UnsafeMutablePointer<IntLA>,
                                    _ vt : UnsafeMutablePointer<Self>, _ ldvt : UnsafeMutablePointer<IntLA>,
                                    _ work : UnsafeMutablePointer<Self>, _ lwork : UnsafeMutablePointer<IntLA>,
                                    _ info : UnsafeMutablePointer<IntLA>) -> Int32
    {
        return dispatch(
            float: {
                var rwork = [Float](repeating: 0, count: 5*Int(min(m.pointee, n.pointee)))
                return cgesvd_(jobu, jobvt, m, n, recast(a), lda, recast(s), recast(u), ldu, recast(vt), ldvt, recast(work), lwork, &rwork, info)
            },
            double: {
                var rwork = [Double](repeating: 0, count: 5*Int(min(m.pointee, n.pointee)))
                return zgesvd_(jobu, jobvt, m, n, recast(a), lda, recast(s), recast(u), ldu, recast(vt), ldvt, recast(work), lwork, &rwork, info)
            }
        )
    }

Looks like you want to pass a pointer to Numerics' Complex type as a pointer to Accelerate's __CLPK_complex. This would still be undefined behavior if it were all written in C.

Swift actually gives you a way to write it without any undefined behavior, which is pretty cool in theory:

import Accelerate
import Numerics

/* Numerics exports this type...
@frozen
public struct Complex<RealType> where RealType: Real {
  internal var x: RealType
  internal var y: RealType
}
*/

// Something imported from Accelerate...
func cfoo(_: UnsafeMutablePointer<__CLPK_complex>!, _: UnsafeMutablePointer<__CLPK_complex>!) {}

extension Complex {
  public static func foo(a: UnsafeMutablePointer<Self>, s: UnsafeMutablePointer<Self>) {
    a.withMemoryRebound(to: __CLPK_complex.self, capacity: 1) { a_recast in
      s.withMemoryRebound(to: __CLPK_complex.self, capacity: 1) { s_recast in
        cfoo(a_recast, s_recast);
      }
    }
  }
}

But you have would have five levels of nested withMemoryRebound calls. Unless @Karoy_Lorentey or @scanon can think of a better way to do this.

I feel bad about this, so I'll let you in on a secret. The way you've used recast turns out to be fine as long as:

  • recast is only ever used as a argument expression for a C function call

  • No C function ever takes both a pointer to Complex and a pointer to "__CLPK_complex"

That way, neither the C compiler, nor Swift compiler will ever be able to exploit the undefined behavior and nothing bad should happen.

2 Likes

Ok, thank you very much for looking into this. I am starting to get a better feel for the pointer API, and withMemoryRebound is what I was looking for. I think I've used recast mostly in the right way, except for the cases where I didn't :-D, like here:

    public static func vDSP_elementwise_adjoint(_ v : [Self]) -> [Self] {
        var (real, imaginary) = vDSP_convert(interleavedComplex: v)
        let N = vDSP_Length(v.count)
        var target_real : [Self.Magnitude] = Array(repeating: 0, count: Int(N))
        var target_imaginary : [Self.Magnitude] = Array(repeating: 0, count: Int(N))
        dispatch(
            float: {
                var source_split = DSPSplitComplex(realp: recast(&real), imagp: recast(&imaginary))
                var target_split = DSPSplitComplex(realp: recast(&target_real), imagp: recast(&target_imaginary))
                vDSP_zvconj(&source_split, 1, &target_split, 1, N)
            },
            double: {
                var source_split = DSPDoubleSplitComplex(realp: recast(&real), imagp: recast(&imaginary))
                var target_split = DSPDoubleSplitComplex(realp: recast(&target_real), imagp: recast(&target_imaginary))
                vDSP_zvconjD(&source_split, 1, &target_split, 1, N)
            }
        )
        return vDSP_convert(real: target_real, imaginary: target_imaginary)
    }

or here:

    public static func blas_iamax_inf(_ N : Int32, _ X : UnsafePointer<Self>, _ incX : Int32) -> Int32 {
        dispatch(
            float: {
                let R : UnsafePointer<Float> = recast(X)
                let I = R + 1
                let inc = 2 * incX
                let i1 = cblas_isamax(N, R, inc)
                let i2 = cblas_isamax(N, I, inc)
                let abs1 = abs(R[Int(2 * i1)])
                let abs2 = abs(I[Int(2 * i2)])
                if abs1 == abs2 {
                    return min(i1, i2)
                } else if abs1 < abs2 {
                    return i2
                } else {
                    return i1
                }
            },
            double: {
                let R : UnsafePointer<Double> = recast(X)
                let I = R + 1
                let inc = 2 * incX
                let i1 = cblas_idamax(N, R, inc)
                let i2 = cblas_idamax(N, I, inc)
                let abs1 = abs(R[Int(2 * i1)])
                let abs2 = abs(I[Int(2 * i2)])
                if abs1 == abs2 {
                    return min(i1, i2)
                } else if abs1 < abs2 {
                    return i2
                } else {
                    return i1
                }
            }
        )
    }

I still feel that these uses should be safe, as I cannot imagine compiler optimisations that cause harm here and still leave the majority of software out there running without crashes, but I guess I better rewrite these cases with withMemoryRebound just to sleep better.